Spark caching questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark caching questions

Vladimir Rodionov
Hi, users

1. Disk based cache eviction policy? The same LRU?

2. What is the scope of a cached RDD? Does it survive application? What happen if I run Java app next time? Will RRD be created or read from cache?

If , answer is YES, then ...


3. Is there are any way to invalidate cached RDD automatically? RDD partitions? Some API kind of : RDD.isValid()? 

4. HadoopRDD InputFormat - based. Some partitions (splits) may become invalid in cache. Can we reload only those partitions? Into cache?    

-Vladimir
Reply | Threaded
Open this post in threaded view
|

Re: Spark caching questions

Mayur Rustagi
Cached RDD do not survive SparkContext deletion (they are scoped on a per sparkcontext basis). 
I am not sure what you mean by disk based cache eviction, if you cache more RDD than disk space the result will not be very pretty :) 

Mayur Rustagi
Ph: +1 (760) 203 3257

On Wed, Sep 10, 2014 at 4:43 AM, Vladimir Rodionov <[hidden email]> wrote:
Hi, users

1. Disk based cache eviction policy? The same LRU?

2. What is the scope of a cached RDD? Does it survive application? What happen if I run Java app next time? Will RRD be created or read from cache?

If , answer is YES, then ...


3. Is there are any way to invalidate cached RDD automatically? RDD partitions? Some API kind of : RDD.isValid()? 

4. HadoopRDD InputFormat - based. Some partitions (splits) may become invalid in cache. Can we reload only those partitions? Into cache?    

-Vladimir