Any way to see the size of the broadcast variable?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Any way to see the size of the broadcast variable?

V0lleyBallJunki3
Hello,
   I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
high value of 20 GB. I am joining a table that I am sure is below this
variable, however spark is doing a SortMergeJoin. If I set a broadcast hint
then spark does a broadcast join and job finishes much faster. However, when
run in production for some large tables, I run into errors. Is there a way
to see the actual size of the table being broadcast? I wrote the table being
broadcast to disk and it took only 32 MB in parquet. I tried to cache this
table in Zeppelin and run a table.count() operation but nothing gets shown
on on the Storage tab of the Spark History Server. spark.util.SizeEstimator
doesn't seem to be giving accurate numbers for this table either. Any way to
figure out the size of this table being broadcast?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Any way to see the size of the broadcast variable?

Gourav Sengupta
Hi Venkat,

do you executors have that much amount of memory?

Regards,
Gourav Sengupta

On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3 <[hidden email]> wrote:
Hello,
   I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
high value of 20 GB. I am joining a table that I am sure is below this
variable, however spark is doing a SortMergeJoin. If I set a broadcast hint
then spark does a broadcast join and job finishes much faster. However, when
run in production for some large tables, I run into errors. Is there a way
to see the actual size of the table being broadcast? I wrote the table being
broadcast to disk and it took only 32 MB in parquet. I tried to cache this
table in Zeppelin and run a table.count() operation but nothing gets shown
on on the Storage tab of the Spark History Server. spark.util.SizeEstimator
doesn't seem to be giving accurate numbers for this table either. Any way to
figure out the size of this table being broadcast?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Any way to see the size of the broadcast variable?

V0lleyBallJunki3
Yes each of the executors have 60GB



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]