Spark driver thread

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark driver thread

James Yu
Hi,

Does a Spark driver always works as single threaded? 

If yes, does it mean asking for more than one vCPU for the driver is wasteful?


Thanks,
James
Reply | Threaded
Open this post in threaded view
|

Re: Spark driver thread

Pol Santamaria
Hi james,

You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver program gathers or sends data to the workers.

So yes, if you do computation or I/O on the driver side, you should explore using multithreads and more than 1 vCPU.

Bests,
Pol Santamaria

On Fri, Mar 6, 2020 at 1:28 AM James Yu <[hidden email]> wrote:
Hi,

Does a Spark driver always works as single threaded? 

If yes, does it mean asking for more than one vCPU for the driver is wasteful?


Thanks,
James
Reply | Threaded
Open this post in threaded view
|

Re: Spark driver thread

James Yu
Pol, thanks for your reply.

Actually I am running Spark apps in CLUSTER mode. Is what you said still applicable in cluster mode.  Thanks in advance for your further clarification.


From: Pol Santamaria <[hidden email]>
Sent: Friday, March 6, 2020 12:59 AM
To: James Yu <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Spark driver thread
 
Hi james,

You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver program gathers or sends data to the workers.

So yes, if you do computation or I/O on the driver side, you should explore using multithreads and more than 1 vCPU.

Bests,
Pol Santamaria

On Fri, Mar 6, 2020 at 1:28 AM James Yu <[hidden email]> wrote:
Hi,

Does a Spark driver always works as single threaded? 

If yes, does it mean asking for more than one vCPU for the driver is wasteful?


Thanks,
James
Reply | Threaded
Open this post in threaded view
|

Re: Spark driver thread

Russell Spitzer
So one thing to know here is that all Java applications are going to use many threads, and just because your particular main method doesn't spawn additional threads doesn't mean library you access or use won't spawn additional threads. The other important note is that Spark doesn't actually equate "core - threads", when you request a core or something like that spark doesn't do anything special to actually make sure only a single physical core is in use.

That said, would allocating more vCpu to a driver make a difference? Probably not. This is very dependent on your own code and whether a lot of work is being done on the driver vs on the executors. For example, are you loading up and processing some data which is used to spawn remote work? If so having more cpus locally may help. So look into your app, is almost all the work inside dataframes or RDDs? Then more resources for the driver won't help.


TLDR; For most use cases 1 core is sufficient regardless of client/cluster mode

On Fri, Mar 6, 2020 at 11:36 AM James Yu <[hidden email]> wrote:
Pol, thanks for your reply.

Actually I am running Spark apps in CLUSTER mode. Is what you said still applicable in cluster mode.  Thanks in advance for your further clarification.


From: Pol Santamaria <[hidden email]>
Sent: Friday, March 6, 2020 12:59 AM
To: James Yu <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Spark driver thread
 
Hi james,

You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver program gathers or sends data to the workers.

So yes, if you do computation or I/O on the driver side, you should explore using multithreads and more than 1 vCPU.

Bests,
Pol Santamaria

On Fri, Mar 6, 2020 at 1:28 AM James Yu <[hidden email]> wrote:
Hi,

Does a Spark driver always works as single threaded? 

If yes, does it mean asking for more than one vCPU for the driver is wasteful?


Thanks,
James
Reply | Threaded
Open this post in threaded view
|

Re: Spark driver thread

Pol Santamaria
I totally agree with Russell.

In my opinion, the best way is to experiment and take measurements. There are different chips, some of them have multithreading, some not, also different system setups... so I'd recommend playing with the 'spark.driver.cores' option.

Bests,
Pol Santamaria

On Fri, Mar 6, 2020 at 6:50 PM Russell Spitzer <[hidden email]> wrote:
So one thing to know here is that all Java applications are going to use many threads, and just because your particular main method doesn't spawn additional threads doesn't mean library you access or use won't spawn additional threads. The other important note is that Spark doesn't actually equate "core - threads", when you request a core or something like that spark doesn't do anything special to actually make sure only a single physical core is in use.

That said, would allocating more vCpu to a driver make a difference? Probably not. This is very dependent on your own code and whether a lot of work is being done on the driver vs on the executors. For example, are you loading up and processing some data which is used to spawn remote work? If so having more cpus locally may help. So look into your app, is almost all the work inside dataframes or RDDs? Then more resources for the driver won't help.


TLDR; For most use cases 1 core is sufficient regardless of client/cluster mode

On Fri, Mar 6, 2020 at 11:36 AM James Yu <[hidden email]> wrote:
Pol, thanks for your reply.

Actually I am running Spark apps in CLUSTER mode. Is what you said still applicable in cluster mode.  Thanks in advance for your further clarification.


From: Pol Santamaria <[hidden email]>
Sent: Friday, March 6, 2020 12:59 AM
To: James Yu <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Spark driver thread
 
Hi james,

You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver program gathers or sends data to the workers.

So yes, if you do computation or I/O on the driver side, you should explore using multithreads and more than 1 vCPU.

Bests,
Pol Santamaria

On Fri, Mar 6, 2020 at 1:28 AM James Yu <[hidden email]> wrote:
Hi,

Does a Spark driver always works as single threaded? 

If yes, does it mean asking for more than one vCPU for the driver is wasteful?


Thanks,
James
Reply | Threaded
Open this post in threaded view
|

Re: Spark driver thread

Enrico Minack
In reply to this post by James Yu
James,

If you are having multithreaded code in your driver, then you should allocate multiple cores. In cluster mode you share the node with other jobs. If you allocate fewer cores than you are using in your driver, then that node gets over-allocated and you are stealing other applications' resources. Be nice and limit the parallelism of your driver and allocate as many spark cores (spark.driver.cores see https://spark.apache.org/docs/latest/configuration.html#application-properties).

Enrico


Am 06.03.20 um 18:36 schrieb James Yu:
Pol, thanks for your reply.

Actually I am running Spark apps in CLUSTER mode. Is what you said still applicable in cluster mode.  Thanks in advance for your further clarification.


From: Pol Santamaria [hidden email]
Sent: Friday, March 6, 2020 12:59 AM
To: James Yu [hidden email]
Cc: [hidden email] [hidden email]
Subject: Re: Spark driver thread
 
Hi james,

You can configure the Spark Driver to use more than a single thread. It is something that depends on the application, but the Spark driver can take advantage of multiple threads in many situations. For instance, when the driver program gathers or sends data to the workers.

So yes, if you do computation or I/O on the driver side, you should explore using multithreads and more than 1 vCPU.

Bests,
Pol Santamaria

On Fri, Mar 6, 2020 at 1:28 AM James Yu <[hidden email]> wrote:
Hi,

Does a Spark driver always works as single threaded? 

If yes, does it mean asking for more than one vCPU for the driver is wasteful?


Thanks,
James