Splitting resource in Spark cluster

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Splitting resource in Spark cluster

Tzahi File
Hi All,

I'm using one spark cluster cluster that contains 50 nodes from type i3.4xl (16Vcores).
I'm trying to run 4 Spark SQL queries simultaneously. 
The data is split to 10 even partitions and the 4 queries run on the same data,but different partition. I have tried to configure the cluster so each job will get the same resources and won't interfere with the other jobs resources.  
When running with 1/2 queries simultaneously I got much better performance then the 4 queries. 
Although I expected to get the same performance. 

I'm looking for your advice on how to improve the performance by tuning the configurations.

I have a total of 15*50 nodes
5 executors per instance
max-executers 37
shuffle partition 750

From what I understand when setting 37 max executors when running 1,2,3,4 jobs in parallel they will have the same executors number, thus the same running time..