LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

Teja
We have ~120 executors with 5 cores each, for a very long-running job which
crunches ~2.5 TB of data with has too many filters to query. Currently, we
have ~30k partitions which make ~90MB per partition.

We are using Spark v2.2.2 as of now. The major problem we are facing is due
to GC on the driver. All of the driver memory (30G) is getting filled and GC
is very active, which is taking more than 50% of the runtime for Full GC
Evacuation. The heap dump indicates that 80% of the memory is being occupied
by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are
clearing newly created objects only.

From the Jira tickets, I got to know that Memory consumption by
LiveListenerBus has been addressed in v2.3 (not sure of the specifics). But
until we evaluate migrating to v2.3, is there any quick fix or workaround
either to prevent various listerner events bulking up in driver's memory or
to identify and disable the Listener which is causing the delay in
processing events.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

Waleed Fateem
Hi Teja,

The only thought I have is maybe considering decreasing the spark.scheduler.listenerbus.eventqueue.capacity parameter. That should decrease the driver memory pressure but of course you'll end up with dropping events probably more frequently, meaning you can't really trust anything you see in the UI anymore. 

I'm not sure what other options there are other than trying things like increasing driver memory and tuning GC. 

Have you looked at the GC logs? For example, are both young and old generation portions of the heap heavily utilized or is it just the young generation? Depending on what you end up seeing in the GC log your particular application might just need a larger young generation size for example.

Just some ideas for you to consider till you make the move to 2.3 or later.

On Tue, Aug 11, 2020 at 7:14 AM Teja <[hidden email]> wrote:
We have ~120 executors with 5 cores each, for a very long-running job which
crunches ~2.5 TB of data with has too many filters to query. Currently, we
have ~30k partitions which make ~90MB per partition.

We are using Spark v2.2.2 as of now. The major problem we are facing is due
to GC on the driver. All of the driver memory (30G) is getting filled and GC
is very active, which is taking more than 50% of the runtime for Full GC
Evacuation. The heap dump indicates that 80% of the memory is being occupied
by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are
clearing newly created objects only.

From the Jira tickets, I got to know that Memory consumption by
LiveListenerBus has been addressed in v2.3 (not sure of the specifics). But
until we evaluate migrating to v2.3, is there any quick fix or workaround
either to prevent various listerner events bulking up in driver's memory or
to identify and disable the Listener which is causing the delay in
processing events.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

Mridul Muralidharan
In reply to this post by Teja
Hi,

  50% of driver time being spent in gc just for listenerbus sounds very high in a 30G heap.
Did you try to take a heap dump and see what is occupying so much memory ?

This will help us eliminate if the memory usage is due to some user code/library holding references to large objects/graph of objects - or memory usage is actually in listener/related code.

Regards,
Mridul


On Tue, Aug 11, 2020 at 8:14 AM Teja <[hidden email]> wrote:
We have ~120 executors with 5 cores each, for a very long-running job which
crunches ~2.5 TB of data with has too many filters to query. Currently, we
have ~30k partitions which make ~90MB per partition.

We are using Spark v2.2.2 as of now. The major problem we are facing is due
to GC on the driver. All of the driver memory (30G) is getting filled and GC
is very active, which is taking more than 50% of the runtime for Full GC
Evacuation. The heap dump indicates that 80% of the memory is being occupied
by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are
clearing newly created objects only.

From the Jira tickets, I got to know that Memory consumption by
LiveListenerBus has been addressed in v2.3 (not sure of the specifics). But
until we evaluate migrating to v2.3, is there any quick fix or workaround
either to prevent various listerner events bulking up in driver's memory or
to identify and disable the Listener which is causing the delay in
processing events.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

Teja
We did take heap dump from the live job. To our surprise, 85% of the memory
is being occupied by `org.apache.spark.scheduler.LiveListenerBus` Here are
few pictures for context

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t10309/Screenshot_2020-09-11_at_10.png>
<http://apache-spark-user-list.1001560.n3.nabble.com/file/t10309/Screenshot_2020-09-11_at_10.png>



Mridul Muralidharan wrote

> Hi,
>
>   50% of driver time being spent in gc just for listenerbus sounds very
> high in a 30G heap.
> Did you try to take a heap dump and see what is occupying so much memory ?
>
> This will help us eliminate if the memory usage is due to some user
> code/library holding references to large objects/graph of objects - or
> memory usage is actually in listener/related code.
>
> Regards,
> Mridul
>
>
> On Tue, Aug 11, 2020 at 8:14 AM Teja &lt;

> saiteja.parsi@

> &gt; wrote:
>
>> We have ~120 executors with 5 cores each, for a very long-running job
>> which
>> crunches ~2.5 TB of data with has too many filters to query. Currently,
>> we
>> have ~30k partitions which make ~90MB per partition.
>>
>> We are using Spark v2.2.2 as of now. The major problem we are facing is
>> due
>> to GC on the driver. All of the driver memory (30G) is getting filled and
>> GC
>> is very active, which is taking more than 50% of the runtime for Full GC
>> Evacuation. The heap dump indicates that 80% of the memory is being
>> occupied
>> by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are
>> clearing newly created objects only.
>>
>> From the Jira tickets, I got to know that Memory consumption by
>> LiveListenerBus has been addressed in v2.3 (not sure of the specifics).
>> But
>> until we evaluate migrating to v2.3, is there any quick fix or workaround
>> either to prevent various listerner events bulking up in driver's memory
>> or
>> to identify and disable the Listener which is causing the delay in
>> processing events.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail:

> user-unsubscribe@.apache

>>
>>





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

Teja
In reply to this post by Mridul Muralidharan
<http://apache-spark-user-list.1001560.n3.nabble.com/file/t10309/Screenshot_2020-09-11_at_10.png>

Sorry for the poor formatting



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]