Spark[SqL] performance tuning

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark[SqL] performance tuning

Lakshmi Nivedita

Hi all,

I have pyspark sql script with loading of one table 80mb and one is 2 mb and rest 3 are small tables performing lots of joins in the script to fetch the data.

My system configuration isĀ 

4 nodes,300 GB,64 cores

To write a data frame into table 24Mb size records . System is taking 4 minutes 2 sec. with parametersĀ 

Driver memory -5G
Executor memory-20 G
Executor cores 5
Number of executors 40
Dynamicalloction.minexecutors 40
Max executors 40
Dynamic initial executors 17
Memory overhead 4G

With default partition 200

Could you please any one suggest me how can I tune this code.

k.Lakshmi Nivedita