Python kubernetes spark 2.4 branch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Python kubernetes spark 2.4 branch

Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi,

I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

 

Regards

Surya

 

Reply | Threaded
Open this post in threaded view
|

Re: Python kubernetes spark 2.4 branch

Yinan Li
Can you give more details on how you ran your app, did you build your own image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hi,

I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

 

Regards

Surya

 

Reply | Threaded
Open this post in threaded view
|

RE: Python kubernetes spark 2.4 branch

Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi Ilan/ Yinan,

Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

 

My spark-submit is as follows:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443 --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

 

Following is the error observed:

 

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

 

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod)

 

This is also the same with the local files as well:

 

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443 --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py http://10.75.145.25:80/Spark/test.py

 

test.py has dependencies from getNN.py.

 

 

But the same is working in spark 2.2 k8s branch.

 

 

Regards

Surya

 

From: Ilan Filonenko <[hidden email]>
Sent: Wednesday, September 26, 2018 2:06 AM
To: [hidden email]
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]>; Spark dev list <[hidden email]>; [hidden email]
Subject: Re: Python kubernetes spark 2.4 branch

 

 

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <[hidden email]> wrote:

Can you give more details on how you ran your app, did you build your own image, and which image are you using?

 

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hi,

I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

 

Regards

Surya

 

Reply | Threaded
Open this post in threaded view
|

RE: Python kubernetes spark 2.4 branch

Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi Ilan/Yinan,

My observation is as follows:

The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.

I guess we need to export PYTHONPATH with this path as well with following code change in entrypoint.sh

 

 

if [ -n "$PYSPARK_FILES" ]; then

    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"

fi

 

to

 

if [ -n "$PYSPARK_FILES" ]; then

    PYTHONPATH="$PYTHONPATH:<directory where the dependent files are downloaded and available in container for example /var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/>"

fi

Let me know, if this approach is fine.

 

Please correct me if my understanding is wrong with this approach.

 

Regards

Surya

 

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko <[hidden email]>; [hidden email]
Cc: Spark dev list <[hidden email]>; [hidden email]
Subject: RE: Python kubernetes spark 2.4 branch

 

Hi Ilan/ Yinan,

Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

 

My spark-submit is as follows:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443 --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

 

Following is the error observed:

 

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

 

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod)

 

This is also the same with the local files as well:

 

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443 --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py http://10.75.145.25:80/Spark/test.py

 

test.py has dependencies from getNN.py.

 

 

But the same is working in spark 2.2 k8s branch.

 

 

Regards

Surya

 

From: Ilan Filonenko <[hidden email]>
Sent: Wednesday, September 26, 2018 2:06 AM
To: [hidden email]
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]>; Spark dev list <[hidden email]>; [hidden email]
Subject: Re: Python kubernetes spark 2.4 branch

 

 

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <[hidden email]> wrote:

Can you give more details on how you ran your app, did you build your own image, and which image are you using?

 

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hi,

I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

 

Regards

Surya