This post has NOT been accepted by the mailing list yet.
This post was updated on .
I have one read stream to consume data from a Kafka topic , and based on an attribute value in each of the incoming messages, I have to write data to either of the 2 different locations in S3 (if value1 write to location1, otherwise to location2).
On a high level below is what I have for doing that,
it seems to work fine and I see data written to different paths when the apps deployed. But, whenever the job is restarted on failure or on manual stops and starts, it keeps failing with below exception (where userSessionEventJoin.global is my topic name),
if I delete all the checkpointing information, then it starts again and starts new checkpointing in the given 2 locations, but that means I have to start processing from the latest offset again and lose all previous offsets.
The spark version is 2.1. Please suggest any resolutions, thanks.