Use Shared Variable in PySpark Executors

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Use Shared Variable in PySpark Executors

Soheil Pourbafrani
Hi, I want to do some processing with PySpark and save the results in a variable of type tuple that should be shared among the executors for further processing.
Actually, it's a Text Mining Processing and I want to use the Vector Space Model. So I want to calculate the Vector of all Words (that should be reachable for all executors) and save it in a tuple. Is it possible in Spark or I should use external storage like database or files?


Reply | Threaded
Open this post in threaded view
|

Re: Use Shared Variable in PySpark Executors

Jörn Franke
Do you want to calculate it and share it once with all other executors? Then a broadcast variable maybe interesting for you,

> On 22. Sep 2018, at 16:33, Soheil Pourbafrani <[hidden email]> wrote:
>
> Hi, I want to do some processing with PySpark and save the results in a variable of type tuple that should be shared among the executors for further processing.
> Actually, it's a Text Mining Processing and I want to use the Vector Space Model. So I want to calculate the Vector of all Words (that should be reachable for all executors) and save it in a tuple. Is it possible in Spark or I should use external storage like database or files?
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Use Shared Variable in PySpark Executors

Soheil Pourbafrani
Ok, I'll do that. Thanks

On Sat, Sep 22, 2018 at 7:09 PM Jörn Franke <[hidden email]> wrote:
Do you want to calculate it and share it once with all other executors? Then a broadcast variable maybe interesting for you,

> On 22. Sep 2018, at 16:33, Soheil Pourbafrani <[hidden email]> wrote:
>
> Hi, I want to do some processing with PySpark and save the results in a variable of type tuple that should be shared among the executors for further processing.
> Actually, it's a Text Mining Processing and I want to use the Vector Space Model. So I want to calculate the Vector of all Words (that should be reachable for all executors) and save it in a tuple. Is it possible in Spark or I should use external storage like database or files?
>
>