How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

kant kodali
Hi All,

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2? 

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

kathleen li
Not sure what you mean about “raw” Spark sql, but there is one parameter which will impact the optimizer choose broadcast join automatically or not :

spark.sql.autoBroadcastJoinThreshold

You can read Spark doc about above parameter setting and using explain to check your join using broadcast or not.

Make sure you gather statistics for tables.
 
There is broadcast hint also. Please be aware if the table being broadcasted to all worker nodes is fairly big, it will not be a good option always.

Kathleen

Sent from my iPhone

> On Oct 3, 2018, at 4:37 PM, kant kodali <[hidden email]> wrote:
>
> Hi All,
>
> How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?
>
> Thanks
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]