Is RDD thread safe?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Is RDD thread safe?

Chang Chen

Hi all

I meet a case where I need cache a source RDD, and then create different DataFrame from it in different threads to accelerate query.

I know that SparkSession is thread safe(https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure whether RDD  si thread safe or not

Thanks
Chang
Reply | Threaded
Open this post in threaded view
|

Re: Is RDD thread safe?

Sonal Goyal
the RDD or the dataframe is distributed and partitioned by Spark so as to leverage all your workers (CPUs) effectively. So all the Dataframe operations are actually happening simultaneously on a section of the data. Why do you want to use threading here? 

Thanks,
Sonal
Nube Technologies 






On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <[hidden email]> wrote:

Hi all

I meet a case where I need cache a source RDD, and then create different DataFrame from it in different threads to accelerate query.

I know that SparkSession is thread safe(https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure whether RDD  si thread safe or not

Thanks
Chang
Reply | Threaded
Open this post in threaded view
|

Re: Is RDD thread safe?

Chang Chen
I need to cache the DataFrame for accelerating query.  In such case, the two query may simultaneously run the DAG before cache data actually happen.

Sonal Goyal <[hidden email]> 于2019年11月19日周二 下午9:46写道:
the RDD or the dataframe is distributed and partitioned by Spark so as to leverage all your workers (CPUs) effectively. So all the Dataframe operations are actually happening simultaneously on a section of the data. Why do you want to use threading here? 

Thanks,
Sonal
Nube Technologies 






On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <[hidden email]> wrote:

Hi all

I meet a case where I need cache a source RDD, and then create different DataFrame from it in different threads to accelerate query.

I know that SparkSession is thread safe(https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure whether RDD  si thread safe or not

Thanks
Chang