Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

XiaoboGu
0.8.1 of Spark is released now , do you mean we can share cached RDDs using this version now?


------------------ Original ------------------
From:  "Sriram Ramachandrasekaran"<[hidden email]>;
Date:  Jan 2, 2014
To:  "user"<[hidden email]>;
Subject:  Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

Yes the driver would run on the machine from which you launch your spark job. As for sharing cached RDDs, I don't think it's possible up until 0.8.1. The RDDs are not available across spark contexts, if my understanding is right.

If you still want to share RDDs, then you might have write a single service that maintains the cached RDD and the various other apps that want to access that RDD talk to that service. If I understand right, Shark handles SQL queries like this.


On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <[hidden email]> wrote:
We have different developers sharing a Spark cluster, and we don't let developers touch the master server. Each of the developers will commit their application from their desktop, then does each driver run on their desktops?
Buy the way, can developers share cached RDDs.


------------------ Original ------------------
Sender: "Mayur Rustagi"<[hidden email]>;
Send time: Tuesday, Dec 31, 2013 10:11 PM
To: "user"<[hidden email]>;
Subject: Re: Reply: Any best practice for hardware configuration for themasterserver in standalone cluster mode?

Driver is the process that manages the execution across the cluster. So say your application is a "sql query" then the system spawns a shark-cli-driver that uses spark framework, hdfs etc to execute the query and deliver result. All this happens automatically so you dont need to worry about it as a user of spark/shark framework. Just go for a bigger machine with a master.




On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <[hidden email]> wrote:
Thanks for your reply, I am new hand at Spark, does driver mean the server where user applications are commit?



------------------ Original ------------------
Sender: "Mayur Rustagi"<[hidden email]>;
Send time: Tuesday, Dec 31, 2013 9:55 PM
To: "user"<[hidden email]>;
Subject: Re: Any best practice for hardware configuration for the masterserver in standalone cluster mode?

Master server needs to be a little beefy as the driver runs on it. We ran into some issues around scaling due to master servers. You can offload the drivers to workers or other machines then the master server can be smaller.
Regards
Mayur



On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <[hidden email]> wrote:
Him

I read the following article regarding to hardware configurations for the worker servers in the standalone cluster mode, but what about the master server?



Regards,

Xiaobo Gu






--
It's just about how deep your longing is!
Reply | Threaded
Open this post in threaded view
|

答复: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

jasonliu

Actually, we can’t even in 0.8.1

 

发件人: guxiaobo1982 [mailto:[hidden email]]
发送时间: 201412 12:51
收件人: user
主题: Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

 

0.8.1 of Spark is released now , do you mean we can share cached RDDs using this version now?

 

 

------------------ Original ------------------

From:  "Sriram Ramachandrasekaran"<[hidden email]>;

Date:  Jan 2, 2014

To:  "user"<[hidden email]>;

Subject:  Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

 

Yes the driver would run on the machine from which you launch your spark job. As for sharing cached RDDs, I don't think it's possible up until 0.8.1. The RDDs are not available across spark contexts, if my understanding is right.

 

If you still want to share RDDs, then you might have write a single service that maintains the cached RDD and the various other apps that want to access that RDD talk to that service. If I understand right, Shark handles SQL queries like this.

 

On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <[hidden email]> wrote:

We have different developers sharing a Spark cluster, and we don't let developers touch the master server. Each of the developers will commit their application from their desktop, then does each driver run on their desktops?

Buy the way, can developers share cached RDDs.

 

 

------------------ Original ------------------

Sender: "Mayur Rustagi"<[hidden email]>;

Send time: Tuesday, Dec 31, 2013 10:11 PM

To: "user"<[hidden email]>;

Subject: Re: Reply: Any best practice for hardware configuration for themasterserver in standalone cluster mode?

 

Driver is the process that manages the execution across the cluster. So say your application is a "sql query" then the system spawns a shark-cli-driver that uses spark framework, hdfs etc to execute the query and deliver result. All this happens automatically so you dont need to worry about it as a user of spark/shark framework. Just go for a bigger machine with a master.

 


Mayur Rustagi
Ph: +919632149971

 

On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <[hidden email]> wrote:

Thanks for your reply, I am new hand at Spark, does driver mean the server where user applications are commit?

 

 

 

------------------ Original ------------------

Sender: "Mayur Rustagi"<[hidden email]>;

Send time: Tuesday, Dec 31, 2013 9:55 PM

To: "user"<[hidden email]>;

Subject: Re: Any best practice for hardware configuration for the masterserver in standalone cluster mode?

 

Master server needs to be a little beefy as the driver runs on it. We ran into some issues around scaling due to master servers. You can offload the drivers to workers or other machines then the master server can be smaller.

Regards
Mayur


Mayur Rustagi
Ph: +919632149971

 

On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <[hidden email]> wrote:

Him

 

I read the following article regarding to hardware configurations for the worker servers in the standalone cluster mode, but what about the master server?

 

 

 

Regards,

 

Xiaobo Gu

 

 

 



 

--
It's just about how deep your longing is!

Reply | Threaded
Open this post in threaded view
|

Re: 答复: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

Ashish Rangole

One can take a look into Tacheyon project to share the RDDs across various Spark contexts.

On Jan 1, 2014 10:55 PM, "jasonliu" <[hidden email]> wrote:

Actually, we can’t even in 0.8.1

 

发件人: guxiaobo1982 [mailto:[hidden email]]
发送时间: 201412 12:51
收件人: user
主题: Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

 

0.8.1 of Spark is released now , do you mean we can share cached RDDs using this version now?

 

 

------------------ Original ------------------

From:  "Sriram Ramachandrasekaran"<[hidden email]>;

Date:  Jan 2, 2014

To:  "user"<[hidden email]>;

Subject:  Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?

 

Yes the driver would run on the machine from which you launch your spark job. As for sharing cached RDDs, I don't think it's possible up until 0.8.1. The RDDs are not available across spark contexts, if my understanding is right.

 

If you still want to share RDDs, then you might have write a single service that maintains the cached RDD and the various other apps that want to access that RDD talk to that service. If I understand right, Shark handles SQL queries like this.

 

On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <[hidden email]> wrote:

We have different developers sharing a Spark cluster, and we don't let developers touch the master server. Each of the developers will commit their application from their desktop, then does each driver run on their desktops?

Buy the way, can developers share cached RDDs.

 

 

------------------ Original ------------------

Sender: "Mayur Rustagi"<[hidden email]>;

Send time: Tuesday, Dec 31, 2013 10:11 PM

To: "user"<[hidden email]>;

Subject: Re: Reply: Any best practice for hardware configuration for themasterserver in standalone cluster mode?

 

Driver is the process that manages the execution across the cluster. So say your application is a "sql query" then the system spawns a shark-cli-driver that uses spark framework, hdfs etc to execute the query and deliver result. All this happens automatically so you dont need to worry about it as a user of spark/shark framework. Just go for a bigger machine with a master.

 


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971

 

On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <[hidden email]> wrote:

Thanks for your reply, I am new hand at Spark, does driver mean the server where user applications are commit?

 

 

 

------------------ Original ------------------

Sender: "Mayur Rustagi"<[hidden email]>;

Send time: Tuesday, Dec 31, 2013 9:55 PM

To: "user"<[hidden email]>;

Subject: Re: Any best practice for hardware configuration for the masterserver in standalone cluster mode?

 

Master server needs to be a little beefy as the driver runs on it. We ran into some issues around scaling due to master servers. You can offload the drivers to workers or other machines then the master server can be smaller.

Regards
Mayur


Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971

 

On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <[hidden email]> wrote:

Him

 

I read the following article regarding to hardware configurations for the worker servers in the standalone cluster mode, but what about the master server?

 

 

 

Regards,

 

Xiaobo Gu

 

 

 



 

--
It's just about how deep your longing is!