Is RDD.persist honoured if multiple actions are executed in parallel

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is RDD.persist honoured if multiple actions are executed in parallel

Arya Ketan
Hi,
I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I have multiple actions. I am running them in parallel by executing the actions in separate threads. I have  a rdd.persist after which the DAG forks into multiple actions.
but I see that rdd caching is not happening  and the entire DAG is executed twice ( once in each action) .

What am I missing?
Arya
Reply | Threaded
Open this post in threaded view
|

Re: Is RDD.persist honoured if multiple actions are executed in parallel

srowen
It is but it happens asynchronously. If you access the same block twice quickly, the cached block may not yet be available the second time yet. 

On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <[hidden email]> wrote:
Hi,
I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I have multiple actions. I am running them in parallel by executing the actions in separate threads. I have  a rdd.persist after which the DAG forks into multiple actions.
but I see that rdd caching is not happening  and the entire DAG is executed twice ( once in each action) .

What am I missing?
Arya
Reply | Threaded
Open this post in threaded view
|

Re: Is RDD.persist honoured if multiple actions are executed in parallel

Arya Ketan
Thanks, we were able to validate the same behaviour. 

On Wed, 23 Sep 2020 at 18:05, Sean Owen <[hidden email]> wrote:
It is but it happens asynchronously. If you access the same block twice quickly, the cached block may not yet be available the second time yet. 

On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <[hidden email]> wrote:
Hi,
I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I have multiple actions. I am running them in parallel by executing the actions in separate threads. I have  a rdd.persist after which the DAG forks into multiple actions.
but I see that rdd caching is not happening  and the entire DAG is executed twice ( once in each action) .

What am I missing?
Arya




--
Arya
Reply | Threaded
Open this post in threaded view
|

Re: Is RDD.persist honoured if multiple actions are executed in parallel

Michael Mior-2
If you want to ensure the persisted RDD has been calculated first,
just run foreach with a dummy function first to force evaluation.

--
Michael Mior
[hidden email]

Le jeu. 24 sept. 2020 à 00:38, Arya Ketan <[hidden email]> a écrit :

>
> Thanks, we were able to validate the same behaviour.
>
> On Wed, 23 Sep 2020 at 18:05, Sean Owen <[hidden email]> wrote:
>>
>> It is but it happens asynchronously. If you access the same block twice quickly, the cached block may not yet be available the second time yet.
>>
>> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <[hidden email]> wrote:
>>>
>>> Hi,
>>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I have multiple actions. I am running them in parallel by executing the actions in separate threads. I have  a rdd.persist after which the DAG forks into multiple actions.
>>> but I see that rdd caching is not happening  and the entire DAG is executed twice ( once in each action) .
>>>
>>> What am I missing?
>>> Arya
>>>
>>>
>>
>>
> --
> Arya

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]