Spark and disk cache.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark and disk cache.

Mskh
Hi,

When I cache a table in memory for the first time in Spark (version 0.8.0), it usually takes 10 mins. If I were to quit Spark and restart it then re-cache the same table in memory, the operation would take 4 mins. I had the assumption that quitting the Spark session will un-cache the table from memory. Does any OS caching take place since re-caching the table takes half the original time?

Thanks
Mskh
Reply | Threaded
Open this post in threaded view
|

Re: Spark and disk cache.

Woody Christy
It sounds like your underlying data set is in the OS page cache.  If you want to do test that does it purely from disk do this on each node before you re-cache the same table:

echo 3 > /proc/sys/vm/drop_caches


On Tue, Feb 4, 2014 at 7:44 AM, Mskh <[hidden email]> wrote:
Hi,

When I cache a table in memory for the first time in Spark (version 0.8.0),
it usually takes 10 mins. If I were to quit Spark and restart it then
re-cache the same table in memory, the operation would take 4 mins. I had
the assumption that quitting the Spark session will un-cache the table from
memory. Does any OS caching take place since re-caching the table takes half
the original time?

Thanks
Mskh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-disk-cache-tp1180.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.



--

Woody Christy
Solutions Architect | Partner Engineering | Cloudera Inc
@woodychristy