I'm using one spark cluster cluster that contains 50 nodes from type i3.4xl (16Vcores).
I'm trying to run 4 Spark SQL queries simultaneously.
The data is split to 10 even partitions and the 4 queries run on the same data,but different partition. I have tried to configure the cluster so each job will get the same resources and won't interfere with the other jobs resources.
When running with 1/2 queries simultaneously I got much better performance then the 4 queries.
Although I expected to get the same performance.
I'm looking for your advice on how to improve the performance by tuning the configurations.
I have a total of 15*50 nodes
5 executors per instance
shuffle partition 750
From what I understand when setting 37 max executors when running 1,2,3,4 jobs in parallel they will have the same executors number, thus the same running time..