Hi all,
I have pyspark sql script with loading of one table 80mb and one is 2 mb and rest 3 are small tables performing lots of joins in the script to fetch the data.
My system configuration isĀ
4 nodes,300 GB,64 cores
To write a data frame into table 24Mb size records . System is taking 4 minutes 2 sec. with parametersĀ
Driver memory -5G
Executor memory-20 G
Executor cores 5
Number of executors 40
Dynamicalloction.minexecutors 40
Max executors 40
Dynamic initial executors 17
Memory overhead 4G
With default partition 200
Could you please any one suggest me how can I tune this code.
--
k.Lakshmi Nivedita