How to gracefully handle Kafka OffsetOutOfRangeException
This post has NOT been accepted by the mailing list yet.
I am using Spark streaming and reading data from Kafka using KafkaUtils.createDirectStream. I have the "auto.offset.reset" set to smallest.
But in some Kafka partitions, I get kafka.common.OffsetOutOfRangeException and my spark job crashes.
I want to understand if there is a graceful way to handle this failure and not kill the job. I want to keep ignoring these exceptions, as some other partitions are fine and I am okay with data loss.
Is there any way to handle this and not have my spark job crash? I have no option of increasing the kafka retention period.
I tried to have the DStream returned by createDirectStream() wrapped in a Try construct, but since the exception happens in the executor, the Try construct didn't take effect. Do you have any ideas of how to handle this?