Splitting resource in Spark cluster

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Splitting resource in Spark cluster

Tzahi File
Hi All,

I'm using one spark cluster cluster that contains 50 nodes from type i3.4xl (16Vcores).
I'm trying to run 4 Spark SQL queries simultaneously. 
 
The data is split to 10 even partitions and the 4 queries run on the same data,but different partition. I have tried to configure the cluster so each job will get the same resources and won't interfere with the other jobs resources.  
When running with 1/2 queries simultaneously I got much better performance then the 4 queries. 
Although I expected to get the same performance. 

I'm looking for your advice on how to improve the performance by tuning the configurations.

I have a total of 15*50 nodes
5 executors per instance
max-executers 37
shuffle partition 750
... 

From what I understand when setting 37 max executors when running 1,2,3,4 jobs in parallel they will have the same executors number, thus the same running time.. 


Thanks,
Tzahi