Cache not getting cleaned.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Cache not getting cleaned.

Amit Sharma-2
I am using df.cache and also unpersisting it. But when I check spark Ui storage I still see cache memory usage. Do I need to do any thing else.

Also in executor tab on spark Ui for each executor memory used/total memory always display some used memory not sure if no request on streaming job then usages should be 0.

Thanks
Amit
Reply | Threaded
Open this post in threaded view
|

Re: Cache not getting cleaned.

Amit Sharma-2
please find attached the screenshot of no active task but memory i still used . 

image.png

On Sat, Nov 21, 2020 at 4:25 PM Amit Sharma <[hidden email]> wrote:
I am using df.cache and also unpersisting it. But when I check spark Ui storage I still see cache memory usage. Do I need to do any thing else.

Also in executor tab on spark Ui for each executor memory used/total memory always display some used memory not sure if no request on streaming job then usages should be 0.

Thanks
Amit
Reply | Threaded
Open this post in threaded view
|

Re: Cache not getting cleaned.

Kevin Pis
In reply to this post by Amit Sharma-2

Hi Amit:

 

the dataset unpersist function will not uncache all the cached data. It will only uncache the given plan.  The following is related source code:

 

DataSet.scala :

Text

Description automatically generated

 

CacheManager.scala

Text

Description automatically generated

 

The fault cascade value is false and it will un-cache the given plan only. If you want learn more details, pls see the related code.

 

From: Amit Sharma <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sunday, November 22, 2020 at 5:26 AM
To: "[hidden email]" <[hidden email]>
Subject: Cache not getting cleaned.

 

I am using df.cache and also unpersisting it. But when I check spark Ui storage I still see cache memory usage. Do I need to do any thing else.

 

Also in executor tab on spark Ui for each executor memory used/total memory always display some used memory not sure if no request on streaming job then usages should be 0.

 

Thanks

Amit

Reply | Threaded
Open this post in threaded view
|

Re: Cache not getting cleaned.

Amit Sharma-2
If it just remove logical plan then how to remove actual data frame data inside cache?


Thanks
Amit

On Sun, Nov 22, 2020 at 4:38 AM chen kevin <[hidden email]> wrote:

Hi Amit:

 

the dataset unpersist function will not uncache all the cached data. It will only uncache the given plan.  The following is related source code:

 

DataSet.scala :

Text

Description automatically generated

 

CacheManager.scala

Text

Description automatically generated

 

The fault cascade value is false and it will un-cache the given plan only. If you want learn more details, pls see the related code.

 

From: Amit Sharma <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sunday, November 22, 2020 at 5:26 AM
To: "[hidden email]" <[hidden email]>
Subject: Cache not getting cleaned.

 

I am using df.cache and also unpersisting it. But when I check spark Ui storage I still see cache memory usage. Do I need to do any thing else.

 

Also in executor tab on spark Ui for each executor memory used/total memory always display some used memory not sure if no request on streaming job then usages should be 0.

 

Thanks

Amit

Reply | Threaded
Open this post in threaded view
|

Re: Cache not getting cleaned.

Kevin Pis

The unpersist function will trigger two actions: Uncaching dataframe data and re-compile dependent cached queries. When blocking is false, the two actions are asynchronousSo when the recompiling is in progress and  the dataframe cache data to be remove  isn`t removed, it will be used by the  cached query which is recompiling, then the cached dataframe data won’t be removed.

So it will remove actual data frame data inside cache in the following case:

  1. the removing is earlier than the recompiling.
  2. the recompiling won’t use the catched  dataframe data.

If you want to learn more details, pls see the code:

Text

Description automatically generated

 

From: Amit Sharma <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sunday, November 22, 2020 at 8:35 PM
To: chen kevin <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: Cache not getting cleaned.

 

If it just remove logical plan then how to remove actual data frame data inside cache?

 

 

Thanks

Amit

 

On Sun, Nov 22, 2020 at 4:38 AM chen kevin <[hidden email]> wrote:

Hi Amit:

 

the dataset unpersist function will not uncache all the cached data. It will only uncache the given plan.  The following is related source code:

 

DataSet.scala :

Text

Description automatically generated

 

CacheManager.scala

Text

Description automatically generated

 

The fault cascade value is false and it will un-cache the given plan only. If you want learn more details, pls see the related code.

 

From: Amit Sharma <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sunday, November 22, 2020 at 5:26 AM
To: "[hidden email]" <[hidden email]>
Subject: Cache not getting cleaned.

 

I am using df.cache and also unpersisting it. But when I check spark Ui storage I still see cache memory usage. Do I need to do any thing else.

 

Also in executor tab on spark Ui for each executor memory used/total memory always display some used memory not sure if no request on streaming job then usages should be 0.

 

Thanks

Amit