broadcast() multiple times the same df. Is it cached ?
This post has NOT been accepted by the mailing list yet.
Hi spark folks,
In our application, we have to join a dataframe with several other df (not always the same joining column).
This left-hand side df is not very large, so a broadcast hint may be beneficial.
My questions :
- if the same df get broadcast multiple times, will the transfer occur once (the broadcast data is somehow cached on executors), or multiple times ?
- If the join concern different cols, will it be cached as well ?