Efficient Spark-Submit planning

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Efficient Spark-Submit planning

Aakash Basu-2
Hi,

Can someone please clarify a little on how should we effectively calculate the parameters to be passed over using spark-submit.

Parameters as in -

Cores, NumExecutors, DriverMemory, etc.

Is there any generic calculation which can be done over most kind of clusters with different sizes from small 3 node to 100s of nodes.

Thanks,
Aakash.
Reply | Threaded
Open this post in threaded view
|

Re: Efficient Spark-Submit planning

Sonal Goyal
Overall the defaults are sensible, but you definitely have to look at your application and optimise a few of them. I mostly refer to the following links when the job is slow or failing or we have more hardware which we see we are not utilizing.



Thanks,
Sonal
Nube Technologies 





On Tue, Sep 12, 2017 at 2:40 AM, Aakash Basu <[hidden email]> wrote:
Hi,

Can someone please clarify a little on how should we effectively calculate the parameters to be passed over using spark-submit.

Parameters as in -

Cores, NumExecutors, DriverMemory, etc.

Is there any generic calculation which can be done over most kind of clusters with different sizes from small 3 node to 100s of nodes.

Thanks,
Aakash.