I've noticed that when unpersisting an "upstream" Dataset, then the
"downstream" Dataset is also unpersisted. I did not expect this behavior,
and I've noticed that RDDs do not have this behavior.
Below I've pasted a simple reproducible case. There are two datasets, x and
y, where y is created by applying a transformation on x. Both are cached and
materialized (can confirm in the UI Storage tab). Then x is unpersisted,
which as expected removes it from the cache. However, y is also unpersisted
which I didn't expect. I tried this same scenario with RDDs instead and saw
that y was left in the cache as expected.
Is this a bug, or the expected behavior for Datasets?