Unexpected caching behavior

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected caching behavior

pnpritchard
I've noticed that when unpersisting an "upstream" Dataset, then the
"downstream" Dataset is also unpersisted. I did not expect this behavior,
and I've noticed that RDDs do not have this behavior.

Below I've pasted a simple reproducible case. There are two datasets, x and
y, where y is created by applying a transformation on x. Both are cached and
materialized (can confirm in the UI Storage tab). Then x is unpersisted,
which as expected removes it from the cache. However, y is also unpersisted
which I didn't expect. I tried this same scenario with RDDs instead and saw
that y was left in the cache as expected.

Is this a bug, or the expected behavior for Datasets?





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected caching behavior

pnpritchard
Not sure why the example code didn't come through, but here I'll try again:

val x = spark.range(100)
val y = x.map(_.toString)

println(x.storageLevel) //StorageLevel(1 replicas)
println(y.storageLevel) //StorageLevel(1 replicas)

x.cache().foreachPartition(_ => ())
y.cache().foreachPartition(_ => ())

println(x.storageLevel) //StorageLevel(disk, memory, deserialized, 1
replicas)
println(y.storageLevel) //StorageLevel(disk, memory, deserialized, 1
replicas)

x.unpersist()

println(x.storageLevel) //StorageLevel(1 replicas)
println(y.storageLevel) //StorageLevel(1 replicas)




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]