Understanding Executors UI

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding Executors UI

Eric Beabes
image.png


Not sure if this image will go through. (Never sent an email to this mailing list with an image).

I am trying to understand this 'Executors' UI in Spark 2.4. I have a Stateful Structured Streaming job with 'State timeout' set to 10 minutes. When the load on the system is low a message gets written to Kafka immediately after the State times out BUT under heavy load it takes over 40 minutes to get a message on the output topic. Trying to debug this issue & see if performance can be improved.

Questions: 

1) I am requesting 3.2 TB of memory but it seems the job keeps using only 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage Memory'. Wondering if this is a Cluster issue OR am I not setting values correctly?
2) Where can I find documentation to understand different 'Tabs' in the Spark UI? (Sorry, Googling didn't help. I will keep searching.)

Any pointers would be appreciated. Thanks.

Reply | Threaded
Open this post in threaded view
|

RE: Understanding Executors UI

Luca Canali

Hi Eric,

 

A few links, in case they can be useful for your troubleshooting:

 

The Spark Web UI is documented in Spark 3.x documentation, although you can use most of it for Spark 2.4 too: https://spark.apache.org/docs/latest/web-ui.html  

 

Spark memory management is documented at  https://spark.apache.org/docs/latest/tuning.html#memory-management-overview 

Additional resource: see also this diagram https://canali.web.cern.ch/docs/SparkExecutorMemory.png  and https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring

 

Best,

Luca

 

From: Eric Beabes <[hidden email]>
Sent: Wednesday, January 6, 2021 00:20
To: spark-user <[hidden email]>
Subject: Understanding Executors UI

 

image.png

 

 

Not sure if this image will go through. (Never sent an email to this mailing list with an image).

 

I am trying to understand this 'Executors' UI in Spark 2.4. I have a Stateful Structured Streaming job with 'State timeout' set to 10 minutes. When the load on the system is low a message gets written to Kafka immediately after the State times out BUT under heavy load it takes over 40 minutes to get a message on the output topic. Trying to debug this issue & see if performance can be improved.

 

Questions: 

 

1) I am requesting 3.2 TB of memory but it seems the job keeps using only 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage Memory'. Wondering if this is a Cluster issue OR am I not setting values correctly?

2) Where can I find documentation to understand different 'Tabs' in the Spark UI? (Sorry, Googling didn't help. I will keep searching.)

 

Any pointers would be appreciated. Thanks.

 

Reply | Threaded
Open this post in threaded view
|

Re: Understanding Executors UI

Eric Beabes
So when I see this for 'Storage Memory': 3.3TB/ 598.5 GB - it's telling me that Spark is using 3.3 TB of memory & 598.5 GB is used for caching data, correct? What I am surprised about is that these numbers don't change at all throughout the day even though the load on the system is low after 5pm PST.

I would expect the "Memory used" to be lower than 3.3Tb after 5pm PST.

Does Spark 3.0 do a better job of memory management? Wondering if upgrading to Spark 3.0 would improve performance?


On Wed, Jan 6, 2021 at 2:29 PM Luca Canali <[hidden email]> wrote:

Hi Eric,

 

A few links, in case they can be useful for your troubleshooting:

 

The Spark Web UI is documented in Spark 3.x documentation, although you can use most of it for Spark 2.4 too: https://spark.apache.org/docs/latest/web-ui.html  

 

Spark memory management is documented at  https://spark.apache.org/docs/latest/tuning.html#memory-management-overview 

Additional resource: see also this diagram https://canali.web.cern.ch/docs/SparkExecutorMemory.png  and https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring

 

Best,

Luca

 

From: Eric Beabes <[hidden email]>
Sent: Wednesday, January 6, 2021 00:20
To: spark-user <[hidden email]>
Subject: Understanding Executors UI

 

image.png

 

 

Not sure if this image will go through. (Never sent an email to this mailing list with an image).

 

I am trying to understand this 'Executors' UI in Spark 2.4. I have a Stateful Structured Streaming job with 'State timeout' set to 10 minutes. When the load on the system is low a message gets written to Kafka immediately after the State times out BUT under heavy load it takes over 40 minutes to get a message on the output topic. Trying to debug this issue & see if performance can be improved.

 

Questions: 

 

1) I am requesting 3.2 TB of memory but it seems the job keeps using only 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage Memory'. Wondering if this is a Cluster issue OR am I not setting values correctly?

2) Where can I find documentation to understand different 'Tabs' in the Spark UI? (Sorry, Googling didn't help. I will keep searching.)

 

Any pointers would be appreciated. Thanks.

 

Reply | Threaded
Open this post in threaded view
|

RE: Understanding Executors UI

Luca Canali

You report 'Storage Memory': 3.3TB/ 598.5 GB -> The first number is the memory used for storage, the second one is the available memory (for storage) in the unified memory pool.

The used memory shown in your webui snippet is indeed quite high (higher than the available memory!? ), you can probably profit by drilling down on that to understand better what is happening.

For example look at the details per executor (the numbers you reported are aggregated values), then also look at the “storage tab” for a list of cached RDDs with details.

In case, Spark 3.0 has improved memory instrumentation and improved instrumentation for streaming, so you can you profit from testing there too.

 

 

From: Eric Beabes <[hidden email]>
Sent: Friday, January 8, 2021 04:23
To: Luca Canali <[hidden email]>
Cc: spark-user <[hidden email]>
Subject: Re: Understanding Executors UI

 

So when I see this for 'Storage Memory': 3.3TB/ 598.5 GB - it's telling me that Spark is using 3.3 TB of memory & 598.5 GB is used for caching data, correct? What I am surprised about is that these numbers don't change at all throughout the day even though the load on the system is low after 5pm PST.

 

I would expect the "Memory used" to be lower than 3.3Tb after 5pm PST.

 

Does Spark 3.0 do a better job of memory management? Wondering if upgrading to Spark 3.0 would improve performance?

 

 

On Wed, Jan 6, 2021 at 2:29 PM Luca Canali <[hidden email]> wrote:

Hi Eric,

 

A few links, in case they can be useful for your troubleshooting:

 

The Spark Web UI is documented in Spark 3.x documentation, although you can use most of it for Spark 2.4 too: https://spark.apache.org/docs/latest/web-ui.html  

 

Spark memory management is documented at  https://spark.apache.org/docs/latest/tuning.html#memory-management-overview 

Additional resource: see also this diagram https://canali.web.cern.ch/docs/SparkExecutorMemory.png  and https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring

 

Best,

Luca

 

From: Eric Beabes <[hidden email]>
Sent: Wednesday, January 6, 2021 00:20
To: spark-user <
[hidden email]>
Subject: Understanding Executors UI

 

image.png

 

 

Not sure if this image will go through. (Never sent an email to this mailing list with an image).

 

I am trying to understand this 'Executors' UI in Spark 2.4. I have a Stateful Structured Streaming job with 'State timeout' set to 10 minutes. When the load on the system is low a message gets written to Kafka immediately after the State times out BUT under heavy load it takes over 40 minutes to get a message on the output topic. Trying to debug this issue & see if performance can be improved.

 

Questions: 

 

1) I am requesting 3.2 TB of memory but it seems the job keeps using only 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage Memory'. Wondering if this is a Cluster issue OR am I not setting values correctly?

2) Where can I find documentation to understand different 'Tabs' in the Spark UI? (Sorry, Googling didn't help. I will keep searching.)

 

Any pointers would be appreciated. Thanks.

 

Reply | Threaded
Open this post in threaded view
|

Re: Understanding Executors UI

Amit Sharma-2
I believe it’s a spark Ui issue which do not display correct value. I believe it is resolved for spark 3.0.

Thanks
Amit

On Fri, Jan 8, 2021 at 4:00 PM Luca Canali <[hidden email]> wrote:

You report 'Storage Memory': 3.3TB/ 598.5 GB -> The first number is the memory used for storage, the second one is the available memory (for storage) in the unified memory pool.

The used memory shown in your webui snippet is indeed quite high (higher than the available memory!? ), you can probably profit by drilling down on that to understand better what is happening.

For example look at the details per executor (the numbers you reported are aggregated values), then also look at the “storage tab” for a list of cached RDDs with details.

In case, Spark 3.0 has improved memory instrumentation and improved instrumentation for streaming, so you can you profit from testing there too.

 

 

From: Eric Beabes <[hidden email]>
Sent: Friday, January 8, 2021 04:23
To: Luca Canali <[hidden email]>
Cc: spark-user <[hidden email]>
Subject: Re: Understanding Executors UI

 

So when I see this for 'Storage Memory': 3.3TB/ 598.5 GB - it's telling me that Spark is using 3.3 TB of memory & 598.5 GB is used for caching data, correct? What I am surprised about is that these numbers don't change at all throughout the day even though the load on the system is low after 5pm PST.

 

I would expect the "Memory used" to be lower than 3.3Tb after 5pm PST.

 

Does Spark 3.0 do a better job of memory management? Wondering if upgrading to Spark 3.0 would improve performance?

 

 

On Wed, Jan 6, 2021 at 2:29 PM Luca Canali <[hidden email]> wrote:

Hi Eric,

 

A few links, in case they can be useful for your troubleshooting:

 

The Spark Web UI is documented in Spark 3.x documentation, although you can use most of it for Spark 2.4 too: https://spark.apache.org/docs/latest/web-ui.html  

 

Spark memory management is documented at  https://spark.apache.org/docs/latest/tuning.html#memory-management-overview 

Additional resource: see also this diagram https://canali.web.cern.ch/docs/SparkExecutorMemory.png  and https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring

 

Best,

Luca

 

From: Eric Beabes <[hidden email]>
Sent: Wednesday, January 6, 2021 00:20
To: spark-user <
[hidden email]>
Subject: Understanding Executors UI

 

image.png

 

 

Not sure if this image will go through. (Never sent an email to this mailing list with an image).

 

I am trying to understand this 'Executors' UI in Spark 2.4. I have a Stateful Structured Streaming job with 'State timeout' set to 10 minutes. When the load on the system is low a message gets written to Kafka immediately after the State times out BUT under heavy load it takes over 40 minutes to get a message on the output topic. Trying to debug this issue & see if performance can be improved.

 

Questions: 

 

1) I am requesting 3.2 TB of memory but it seems the job keeps using only 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage Memory'. Wondering if this is a Cluster issue OR am I not setting values correctly?

2) Where can I find documentation to understand different 'Tabs' in the Spark UI? (Sorry, Googling didn't help. I will keep searching.)

 

Any pointers would be appreciated. Thanks.

 

Reply | Threaded
Open this post in threaded view
|

Re: Understanding Executors UI

Eric Beabes
I reduced the 'state timeout' from 10 minutes to 2 minutes so that memory would be released quicker & the new numbers for Storage Memory are: 54.7GB out of 598.5GB BUT I still don't trust these numbers. As Amit pointed out, it seems there's a bug in the Spark 2.4 UI.

I am requesting 2TB of Memory but the UI keeps showing 598.5GB. I am not exactly sure if it's a BUG in Spark 2.4 UI OR our cluster is indeed not giving my job enough memory!





On Sun, Jan 10, 2021 at 12:32 AM Amit Sharma <[hidden email]> wrote:
I believe it’s a spark Ui issue which do not display correct value. I believe it is resolved for spark 3.0.

Thanks
Amit

On Fri, Jan 8, 2021 at 4:00 PM Luca Canali <[hidden email]> wrote:

You report 'Storage Memory': 3.3TB/ 598.5 GB -> The first number is the memory used for storage, the second one is the available memory (for storage) in the unified memory pool.

The used memory shown in your webui snippet is indeed quite high (higher than the available memory!? ), you can probably profit by drilling down on that to understand better what is happening.

For example look at the details per executor (the numbers you reported are aggregated values), then also look at the “storage tab” for a list of cached RDDs with details.

In case, Spark 3.0 has improved memory instrumentation and improved instrumentation for streaming, so you can you profit from testing there too.

 

 

From: Eric Beabes <[hidden email]>
Sent: Friday, January 8, 2021 04:23
To: Luca Canali <[hidden email]>
Cc: spark-user <[hidden email]>
Subject: Re: Understanding Executors UI

 

So when I see this for 'Storage Memory': 3.3TB/ 598.5 GB - it's telling me that Spark is using 3.3 TB of memory & 598.5 GB is used for caching data, correct? What I am surprised about is that these numbers don't change at all throughout the day even though the load on the system is low after 5pm PST.

 

I would expect the "Memory used" to be lower than 3.3Tb after 5pm PST.

 

Does Spark 3.0 do a better job of memory management? Wondering if upgrading to Spark 3.0 would improve performance?

 

 

On Wed, Jan 6, 2021 at 2:29 PM Luca Canali <[hidden email]> wrote:

Hi Eric,

 

A few links, in case they can be useful for your troubleshooting:

 

The Spark Web UI is documented in Spark 3.x documentation, although you can use most of it for Spark 2.4 too: https://spark.apache.org/docs/latest/web-ui.html  

 

Spark memory management is documented at  https://spark.apache.org/docs/latest/tuning.html#memory-management-overview 

Additional resource: see also this diagram https://canali.web.cern.ch/docs/SparkExecutorMemory.png  and https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring

 

Best,

Luca

 

From: Eric Beabes <[hidden email]>
Sent: Wednesday, January 6, 2021 00:20
To: spark-user <
[hidden email]>
Subject: Understanding Executors UI

 

image.png

 

 

Not sure if this image will go through. (Never sent an email to this mailing list with an image).

 

I am trying to understand this 'Executors' UI in Spark 2.4. I have a Stateful Structured Streaming job with 'State timeout' set to 10 minutes. When the load on the system is low a message gets written to Kafka immediately after the State times out BUT under heavy load it takes over 40 minutes to get a message on the output topic. Trying to debug this issue & see if performance can be improved.

 

Questions: 

 

1) I am requesting 3.2 TB of memory but it seems the job keeps using only 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage Memory'. Wondering if this is a Cluster issue OR am I not setting values correctly?

2) Where can I find documentation to understand different 'Tabs' in the Spark UI? (Sorry, Googling didn't help. I will keep searching.)

 

Any pointers would be appreciated. Thanks.