pyspark and Python virtual enviroments

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

pyspark and Python virtual enviroments

Christian
Hello,

I usually create different python virtual environments for different projects to avoid version conflicts and skip the requirement to be root to install libs.

How can I specify to pyspark to activate a virtual environment before executing the tasks ?


Thanks in advance,
Christian
Reply | Threaded
Open this post in threaded view
|

Re: pyspark and Python virtual enviroments

Bryn Keller
Hi Christian,

The PYSPARK_PYTHON environment variable specifies the python executable to use for pyspark. You can put the path to a virtualenv's python executable and it will work fine. Remember you have to have the same installation at the same path on each of your cluster nodes for pyspark to work. If you're creating the spark context yourself in a python application, you can use os.environ['PYSPARK_PYTHON'] = sys.executable before creating your spark context.

Hope that helps,
Bryn


On Wed, Mar 5, 2014 at 4:54 AM, Christian <[hidden email]> wrote:
Hello,

I usually create different python virtual environments for different projects to avoid version conflicts and skip the requirement to be root to install libs.

How can I specify to pyspark to activate a virtual environment before executing the tasks ?


Thanks in advance,
Christian

Reply | Threaded
Open this post in threaded view
|

Re: pyspark and Python virtual enviroments

Christian
Thanks Bryn.


On Wed, Mar 5, 2014 at 9:00 PM, Bryn Keller <[hidden email]> wrote:
Hi Christian,

The PYSPARK_PYTHON environment variable specifies the python executable to use for pyspark. You can put the path to a virtualenv's python executable and it will work fine. Remember you have to have the same installation at the same path on each of your cluster nodes for pyspark to work. If you're creating the spark context yourself in a python application, you can use os.environ['PYSPARK_PYTHON'] = sys.executable before creating your spark context.

Hope that helps,
Bryn


On Wed, Mar 5, 2014 at 4:54 AM, Christian <[hidden email]> wrote:
Hello,

I usually create different python virtual environments for different projects to avoid version conflicts and skip the requirement to be root to install libs.

How can I specify to pyspark to activate a virtual environment before executing the tasks ?


Thanks in advance,
Christian