About spark.num.executors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

About spark.num.executors

ijung5399
This post has NOT been accepted by the mailing list yet.
I clearly saw the performance difference between

config A:
```
<entry key="spark.executor.cores" value="1"/>
<entry key="spark.driver.memory" value="10G"/>
<entry key="spark.executor.memory" value="5G"/>
<entry key="spark.num.executors" value="51"/>
```
and

Config B:
```
 <entry key="spark.executor.cores" value="1"/>
 <entry key="spark.driver.memory" value="20G"/>
 <entry key="spark.executor.memory" value="13G"/>
 <entry key="spark.num.executors" value="24"/>
```.

A showed 80+% cpu utilization and B showed 40+%.
So, I thought spark.num.executors works as intended.
But, while looking at spark source codes, I couldn't find any part to deal with "spark.num.executors".
Actually, for the --num-executors, "spark.executor.instances" is the right config parameter.

If "spark.num.executors" is not meaningful, how could the cpu usage so different?

-ijung
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

yncxcw
This post has NOT been accepted by the mailing list yet.
hi,

Could you please illustrate more about your cluster setting(e.g., the size of cluster and capacity of each node) ?
Here, each executor is an independent JVM, on which spark driver would allocate task. 

To verify, you can check how many containers are allocated to your spark application through YARN(the # of container should be equal to the # of executors)

Since config A has more executors, it should have higher cpu utilization(as A has more paralleled JVM instance).
Also, watch for spark.executor.cores which means the maximum paralleled thread in each executor, it also affects the cpu utilization.

By the way, I am curious about did you measure your CPU utilization.


Hope this could help you.


Wei




On Thu, Aug 3, 2017 at 2:42 PM, ijung5399 [via Apache Spark User List] <[hidden email]> wrote:
I clearly saw the performance difference between

config A:
```
<entry key="spark.executor.cores" value="1"/>
<entry key="spark.driver.memory" value="10G"/>
<entry key="spark.executor.memory" value="5G"/>
<entry key="spark.num.executors" value="51"/>
```
and

Config B:
```
 <entry key="spark.executor.cores" value="1"/>
 <entry key="spark.driver.memory" value="20G"/>
 <entry key="spark.executor.memory" value="13G"/>
 <entry key="spark.num.executors" value="24"/>
```.

A showed 80+% cpu utilization and B showed 40+%.
So, I thought spark.num.executors works as intended.
But, while looking at spark source codes, I couldn't find any part to deal with "spark.num.executors".
Actually, for the --num-executors, "spark.executor.instances" is the right config parameter.

If "spark.num.executors" is not meaningful, how could the cpu usage so different?

-ijung



If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/About-spark-num-executors-tp29029.html
To start a new topic under Apache Spark User List, email [hidden email]
To unsubscribe from Apache Spark User List, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

ijung5399
This post has NOT been accepted by the mailing list yet.
Thanks, Wei.

I have one master and six slave nodes. Each slave has 61G memory.
I checked ganglia graphs to see aggregated cpu usage.

I just wonder if any previous spark version had "spark.num.executors" as a config parameter.
Or if it is still a valid config.

-ijung
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

yncxcw
This post has NOT been accepted by the mailing list yet.
Hi

It is still valid. I am still using it(Spark-2.2.0). You can also check the driver's log to check how many executors are assigned(from the maximum executor id).


Wei

On Thu, Aug 3, 2017 at 4:05 PM, ijung5399 [via Apache Spark User List] <[hidden email]> wrote:
Thanks, Wei.

I have one master and six slave nodes. Each slave has 61G memory.
I checked ganglia graphs to see aggregated cpu usage.

I just wonder if any previous spark version had "spark.num.executors" as a config parameter.
Or if it is still a valid config.

-ijung


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/About-spark-num-executors-tp29029p29031.html
To start a new topic under Apache Spark User List, email [hidden email]
To unsubscribe from Apache Spark User List, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

ijung5399
This post has NOT been accepted by the mailing list yet.
In reply to this post by ijung5399
It turns out "spark.dynamicAllocation.enabled" was enabled. So, low cpu usage was actually controlled by available memory.
looks both "spark.num.executors" and "spark.executor.instances" were ignored.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

yncxcw
This post has NOT been accepted by the mailing list yet.
When "spark.dynamicAllocation.enabled" then the "spark.num.executors" will not take effects.

Wei

On Thu, Aug 3, 2017 at 5:22 PM, ijung5399 [via Apache Spark User List] <[hidden email]> wrote:
It turns out "spark.dynamicAllocation.enabled" was enabled. So, low cpu usage was actually controlled by available memory.
looks both "spark.num.executors" and "spark.executor.instances" were ignored.


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/About-spark-num-executors-tp29029p29033.html
To start a new topic under Apache Spark User List, email [hidden email]
To unsubscribe from Apache Spark User List, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

ijung5399
This post has NOT been accepted by the mailing list yet.
Thank you for confirming it, Wei.

I also tested executor# with spark.dynamicAllocation.enabled = false.
both "spark.num.executors" and "spark.executor.instances" take effects but slightly differently.

Because current spark source codes (https://github.com/apache/spark.git) do not have any line with "spark.num.executors", you may want to double-check the difference of "spark.num.executors" and "spark.executor.instances".

I will test longer and investigate the source and reply it in this thread.

-ijung
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: About spark.num.executors

yncxcw
This post has NOT been accepted by the mailing list yet.
Hi,

Sorry for making some misunderstanding, I have checked the doc and found you could set either `--num-executors` or `spark.executor.instances` for executor numbers.  `spark.num.executors` could possibly be deprecated. 

Thanks for reminding and confirming.


Wei

On Thu, Aug 3, 2017 at 6:12 PM, ijung5399 [via Apache Spark User List] <[hidden email]> wrote:
Thank you for confirming it, Wei.

I also tested executor# with spark.dynamicAllocation.enabled = false.
both "spark.num.executors" and "spark.executor.instances" take effects but slightly differently.

Because current spark source codes (https://github.com/apache/spark.git) do not have any line with "spark.num.executors", you may want to double-check the difference of "spark.num.executors" and "spark.executor.instances".

I will test longer and investigate the source and reply it in this thread.

-ijung


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/About-spark-num-executors-tp29029p29035.html
To start a new topic under Apache Spark User List, email [hidden email]
To unsubscribe from Apache Spark User List, click here.
NAML

Loading...