pyspark: Importing other py-files in PYTHONPATH

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

pyspark: Importing other py-files in PYTHONPATH

Anders Bennehag
Hi there,

I am running spark 0.9.0 standalone on a cluster. The documentation http://spark.incubator.apache.org/docs/latest/python-programming-guide.html states that code-dependencies can be deployed through the pyFiles argument to the SparkContext.

But in my case, the relevant code, lets call it myLib is already available in PYTHONPATH on the worker-nodes. However, when trying to access this code through a regular 'import myLib' in the script sent to pyspark, the spark-workers seem to hang in the middle of the script without any specific errors.

If I start a regular python-shell on the workers, there is no problem importing myLib and accessing it.

Why is this?

/Anders

Reply | Threaded
Open this post in threaded view
|

Re: pyspark: Importing other py-files in PYTHONPATH

Anders Bennehag
I just discovered that putting myLib in /usr/local/python2-7/dist-packages/ on the worker-nodes will let me import the module in a pyspark-script...

That is a solution but it would be nice if modules in PYTHONPATH were included as well.


On Wed, Mar 5, 2014 at 1:34 PM, Anders Bennehag <[hidden email]> wrote:
Hi there,

I am running spark 0.9.0 standalone on a cluster. The documentation http://spark.incubator.apache.org/docs/latest/python-programming-guide.html states that code-dependencies can be deployed through the pyFiles argument to the SparkContext.

But in my case, the relevant code, lets call it myLib is already available in PYTHONPATH on the worker-nodes. However, when trying to access this code through a regular 'import myLib' in the script sent to pyspark, the spark-workers seem to hang in the middle of the script without any specific errors.

If I start a regular python-shell on the workers, there is no problem importing myLib and accessing it.

Why is this?

/Anders