Request more yarn vcores than executors

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Request more yarn vcores than executors

jelmer
I have a job, running on yarn, that uses multithreading inside of a mapPartitions transformation 

Ideally I would like to have a small number of partitions but have a high number of yarn vcores allocated to the task (that i can take advantage of because of multi threading)

Is this possible? 

I tried running with  : --executor-cores 1 --conf spark.yarn.executor.cores=20
But it seems spark.yarn.executor.cores gets ignored
Reply | Threaded
Open this post in threaded view
|

Re: Request more yarn vcores than executors

Chris Teoh
I thought --executor-cores is the same the other argument. If anything, just set --executor-cores to something greater than 1 and don't set the other one you mentioned. You'll then get greater number of cores per executor so you can take on more simultaneous tasks per executor. 

On Sun, 8 Dec 2019, 8:16 pm jelmer, <[hidden email]> wrote:
I have a job, running on yarn, that uses multithreading inside of a mapPartitions transformation 

Ideally I would like to have a small number of partitions but have a high number of yarn vcores allocated to the task (that i can take advantage of because of multi threading)

Is this possible? 

I tried running with  : --executor-cores 1 --conf spark.yarn.executor.cores=20
But it seems spark.yarn.executor.cores gets ignored
Reply | Threaded
Open this post in threaded view
|

Re: Request more yarn vcores than executors

jelmer
you can take on more simultaneous tasks per executor

That is exactly what I want to avoid. that nature of the task makes it difficult to parallelise over many partitions. Ideally i'd have 1 executor per task with 10+ cores assigned to each executor

On Sun, 8 Dec 2019 at 10:23, Chris Teoh <[hidden email]> wrote:
I thought --executor-cores is the same the other argument. If anything, just set --executor-cores to something greater than 1 and don't set the other one you mentioned. You'll then get greater number of cores per executor so you can take on more simultaneous tasks per executor. 

On Sun, 8 Dec 2019, 8:16 pm jelmer, <[hidden email]> wrote:
I have a job, running on yarn, that uses multithreading inside of a mapPartitions transformation 

Ideally I would like to have a small number of partitions but have a high number of yarn vcores allocated to the task (that i can take advantage of because of multi threading)

Is this possible? 

I tried running with  : --executor-cores 1 --conf spark.yarn.executor.cores=20
But it seems spark.yarn.executor.cores gets ignored
Reply | Threaded
Open this post in threaded view
|

Re: Request more yarn vcores than executors

Chris Teoh
If that is the case, perhaps set vcore to CPU core ratio as 1:1 and just do --executor-cores 1 and that would at least try to get you more threads per executor. Note that vcore is a logical construct and isn't directly related to CPU cores, just the time slice allowed over the entire set of CPUs on each server.

I've seen multi threading at the driver where there might be multiple jobs being run if they're working on unevenly distributed workloads which more efficiently leverage the executors. Perhaps that is something to consider.

On Sun, 8 Dec 2019, 8:29 pm jelmer, <[hidden email]> wrote:
you can take on more simultaneous tasks per executor

That is exactly what I want to avoid. that nature of the task makes it difficult to parallelise over many partitions. Ideally i'd have 1 executor per task with 10+ cores assigned to each executor

On Sun, 8 Dec 2019 at 10:23, Chris Teoh <[hidden email]> wrote:
I thought --executor-cores is the same the other argument. If anything, just set --executor-cores to something greater than 1 and don't set the other one you mentioned. You'll then get greater number of cores per executor so you can take on more simultaneous tasks per executor. 

On Sun, 8 Dec 2019, 8:16 pm jelmer, <[hidden email]> wrote:
I have a job, running on yarn, that uses multithreading inside of a mapPartitions transformation 

Ideally I would like to have a small number of partitions but have a high number of yarn vcores allocated to the task (that i can take advantage of because of multi threading)

Is this possible? 

I tried running with  : --executor-cores 1 --conf spark.yarn.executor.cores=20
But it seems spark.yarn.executor.cores gets ignored