Driver vs master

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Driver vs master

Amit Sharma
Can you please help me understand this. I believe driver programs runs on master node. If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node. So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.


Thanks
Amit
Reply | Threaded
Open this post in threaded view
|

Re: Driver vs master

Andrew Melo
Hi Amit

On Mon, Oct 7, 2019 at 18:33 Amit Sharma <[hidden email]> wrote:
Can you please help me understand this. I believe driver programs runs on master node
If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node.

This depends on what master/deploy mode you're using: if it's "local" master and "client mode" then yes tasks execute in the same JVM as the driver. In this case though, the driver JVM uses whatever much space is allocated for the driver regardless of how many threads you have.


So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.

This depends on your application, but in general more threads will require more memory.




Thanks
Amit
--
It's dark in this basement.
Reply | Threaded
Open this post in threaded view
|

Re: Driver vs master

Amit Sharma
Thanks Andrew but I am asking specific to driver memory not about executors memory. We have just one master and if each jobs driver.memory=4g and master nodes total memory is 16gb then we can not execute more than 4 jobs at a time.

On Monday, October 7, 2019, Andrew Melo <[hidden email]> wrote:
Hi Amit

On Mon, Oct 7, 2019 at 18:33 Amit Sharma <[hidden email]> wrote:
Can you please help me understand this. I believe driver programs runs on master node
If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node.

This depends on what master/deploy mode you're using: if it's "local" master and "client mode" then yes tasks execute in the same JVM as the driver. In this case though, the driver JVM uses whatever much space is allocated for the driver regardless of how many threads you have.


So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.

This depends on your application, but in general more threads will require more memory.




Thanks
Amit
--
It's dark in this basement.
Reply | Threaded
Open this post in threaded view
|

Re: Driver vs master

Andrew Melo
Hi

On Mon, Oct 7, 2019 at 19:20 Amit Sharma <[hidden email]> wrote:
Thanks Andrew but I am asking specific to driver memory not about executors memory. We have just one master and if each jobs driver.memory=4g and master nodes total memory is 16gb then we can not execute more than 4 jobs at a time.

I understand that. I think there's a misunderstanding with the terminology, though. Are you running multiple separate spark instances on a single machine or one instance with multiple jobs inside.



On Monday, October 7, 2019, Andrew Melo <[hidden email]> wrote:
Hi Amit

On Mon, Oct 7, 2019 at 18:33 Amit Sharma <[hidden email]> wrote:
Can you please help me understand this. I believe driver programs runs on master node
If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node.

This depends on what master/deploy mode you're using: if it's "local" master and "client mode" then yes tasks execute in the same JVM as the driver. In this case though, the driver JVM uses whatever much space is allocated for the driver regardless of how many threads you have.


So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.

This depends on your application, but in general more threads will require more memory.




Thanks
Amit
--
It's dark in this basement.
--
It's dark in this basement.
Reply | Threaded
Open this post in threaded view
|

Re: Driver vs master

ayan guha
HI

I think you are mixing terminologies here. Loosely speaking, Master manages worker machines. Each worker machine can run one or more processes. A process can be a driver or executor. You submit applications to the master. Each application will have driver and executors. Master will decide where to put each of them. In cluster mode, master will distribute the drivers across the cluster. In client mode, master will try to run the driver processes within master's own process. You can launch multiple master processes as well and use them for a set of applications - this happens when you use YARN. I am not sure how Mesos or K8 works in that score though. 

HTH...

Ayan



On Tue, Oct 8, 2019 at 12:11 PM Andrew Melo <[hidden email]> wrote:
Hi

On Mon, Oct 7, 2019 at 19:20 Amit Sharma <[hidden email]> wrote:
Thanks Andrew but I am asking specific to driver memory not about executors memory. We have just one master and if each jobs driver.memory=4g and master nodes total memory is 16gb then we can not execute more than 4 jobs at a time.

I understand that. I think there's a misunderstanding with the terminology, though. Are you running multiple separate spark instances on a single machine or one instance with multiple jobs inside.



On Monday, October 7, 2019, Andrew Melo <[hidden email]> wrote:
Hi Amit

On Mon, Oct 7, 2019 at 18:33 Amit Sharma <[hidden email]> wrote:
Can you please help me understand this. I believe driver programs runs on master node
If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node.

This depends on what master/deploy mode you're using: if it's "local" master and "client mode" then yes tasks execute in the same JVM as the driver. In this case though, the driver JVM uses whatever much space is allocated for the driver regardless of how many threads you have.


So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.

This depends on your application, but in general more threads will require more memory.




Thanks
Amit
--
It's dark in this basement.
--
It's dark in this basement.


--
Best Regards,
Ayan Guha
Reply | Threaded
Open this post in threaded view
|

Re: Driver vs master

Andrew Melo


On Mon, Oct 7, 2019 at 20:49 ayan guha <[hidden email]> wrote:
HI

I think you are mixing terminologies here. Loosely speaking, Master manages worker machines. Each worker machine can run one or more processes. A process can be a driver or executor. You submit applications to the master. Each application will have driver and executors. Master will decide where to put each of them. In cluster mode, master will distribute the drivers across the cluster. In client mode, master will try to run the driver processes within master's own process. You can launch multiple master processes as well and use them for a set of applications - this happens when you use YARN. I am not sure how Mesos or K8 works in that score though. 

Right, that's why I initially had the caveat  "This depends on what master/deploy mode you're using: if it's "local" master and "client mode" then yes tasks execute in the same JVM as the driver".

The answer depends on the exact setup Amit has and how the application is configured


HTH...

Ayan



On Tue, Oct 8, 2019 at 12:11 PM Andrew Melo <[hidden email]> wrote:


On Mon, Oct 7, 2019 at 19:20 Amit Sharma <[hidden email]> wrote:
Thanks Andrew but I am asking specific to driver memory not about executors memory. We have just one master and if each jobs driver.memory=4g and master nodes total memory is 16gb then we can not execute more than 4 jobs at a time.

I understand that. I think there's a misunderstanding with the terminology, though. Are you running multiple separate spark instances on a single machine or one instance with multiple jobs inside.



On Monday, October 7, 2019, Andrew Melo <[hidden email]> wrote:
Hi Amit

On Mon, Oct 7, 2019 at 18:33 Amit Sharma <[hidden email]> wrote:
Can you please help me understand this. I believe driver programs runs on master node
If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node.

This depends on what master/deploy mode you're using: if it's "local" master and "client mode" then yes tasks execute in the same JVM as the driver. In this case though, the driver JVM uses whatever much space is allocated for the driver regardless of how many threads you have.


So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.

This depends on your application, but in general more threads will require more memory.




Thanks
Amit
--
It's dark in this basement.
--
It's dark in this basement.


--
Best Regards,
Ayan Guha
--
It's dark in this basement.