Trigger on GroupStateTimeout with no new data in group

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Trigger on GroupStateTimeout with no new data in group

Abhishek Gupta
Hi All,

I had a question about modeling a user session kind of analytics use-case in Spark Structured Streaming. Is there a way to model something like this using Arbitrary stateful Spark streaming

User session -> reads a few FAQS on a website and then decides to create a ticket or not
FAQ Deflection Metrics:
i) Successful Deflection: No issues created within 5 mins of reading the last FAQ
ii) Failed Deflection: Issue is created within 5 mins of reading FAQ

There are 3 cases here, 2 of which can be done using FlatMapGroupWithState, not sure about the 3rd i.e
i) Maintain user's last action state, if issue create event happens and last state is FAQ view within 5 mins -> Failed deflection
ii) Maintain user's last state, if issue create and last state is FAQ view beyond 5 mins -> Successful deflection
iii) Maintain user's last state with maybe a Processing Time timeout of 5 mins i.e FAQ viewed at T1, no issue creation event from user but time now is T1 + 5 mins, so we should increment Successful deflection->

Can we do it using Spark GroupStateTimeout? I was confused if a timeout trigger can happen with no data coming in the group