broadcast() multiple times the same df. Is it cached ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

broadcast() multiple times the same df. Is it cached ?

matd
This post has NOT been accepted by the mailing list yet.
Hi spark folks,

In our application, we have to join a dataframe with several other df (not always the same joining column).

This left-hand side df is not very large, so a broadcast hint may be beneficial.

My questions :
- if the same df get broadcast multiple times, will the transfer occur once (the broadcast data is somehow cached on executors), or multiple times ?
- If the join concern different cols, will it be cached as well ?

Thanks for your insights
Mathieu