Memory consumption and checkpointed data seems to increase incrementally when reduceByKeyAndWIndow with inverse function is used with mapWithState in Stateful streaming
This post has NOT been accepted by the mailing list yet.
Memory consumption and checkpointed data seems to increase incrementally when reduceByKeyAndWindow with inverse function is used with mapWithState.
My application uses stateful streaming with mapWithState. The keys generated by mapWithState are then used by reduceByKeyAndWindow to do rolling counts for 24 hours. The MapWithStateRDD seems to be getting persisted forever even though I have checkpointing enabled every 10 minutes and the ShuffledRDD generated by reduceByKeyAndWindow seems to be getting incremented in memory linearly. Any idea why this happens?
Is it a possibility that ShuffledRDD is caching some data from mapWithState as it is dependent on that for keys?