[Structured Streaming] Domain data refresh with flatMapGroupsWithState

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Structured Streaming] Domain data refresh with flatMapGroupsWithState

Ashutosh Joshi
We have a structured streaming job that processes a stream of events. It needs to perform aggregation while maintaining state, for which we are using flatMapGroupsWithState.

It also needs to load some domain data that needs to be refreshed periodically. To refresh domain data, we are using a solution of query restart that Tathagata suggested in this thread:

This works for the domain data refresh, however, on query restart, the state maintained in  flatMapGroupsWithState is flushed.
Is there a way to retain the state on query refresh?

One way I am thinking is to split the job into two jobs to separate the concerns of domain data refresh and state based processing. Does this make sense? Are there other thoughts on solving this?

Ashutosh Joshi