Using StreamingQueryListener.OnTerminate for Kafka Offset restore

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Using StreamingQueryListener.OnTerminate for Kafka Offset restore

Sandish Kumar HN

Hey Everyone,

I'm using Spark StreamingQueryListener in Structured Streaming App

Whenever I see an OffsetOutOfRangeException's in Spark Job Inside StreamingQueryListener.onTerminated method I'm updating the Spark checkpoint directory offsets.

I was able to parse all OffsetOutOfRangeException's occurred on Job and catch them and parse the partition, offset and connect to Kafka using Kafka API and get right offsets and update the spark checkpointlocation/offsets folder.

Everything works fine even if there are multiple partitions with 

I was able to recover from 
OffsetOutOfRangeException's from the next run.

Does that make sense?

My question Is:
is there any locking for checkpointlocation/offsets folder? what if multiple executors try to update checkpointlocation/offsets folder?

I also see 
StreamingQueryListener asynchronous API? it is across Executors or Just with in the executor?


SandishKumar HN