Spark Structured Streaming: “earliest” as “startingOffsets” is not working

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Structured Streaming: “earliest” as “startingOffsets” is not working

Eric Beabes

My Spark Structured Streaming job works fine when I set "startingOffsets" to "latest". When I simply change it to "earliest" & specify a new "check point directory", the job doesn't work. The states don't get timed out after 10 minutes.

While debugging I noticed that my 'state' logic is indeed getting executed but states just don't time out - as they do when I use "latest". Any reason why?

Is this a known issue?

Note: I've tried this under Spark 2.3 & 2.4

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

Eric Beabes
My apologies...  After I set the 'maxOffsetsPerTrigger' to a value such as '200000' it started working. Hopefully this will help someone. Thanks.

On Fri, Jun 26, 2020 at 2:12 PM Something Something <[hidden email]> wrote:

My Spark Structured Streaming job works fine when I set "startingOffsets" to "latest". When I simply change it to "earliest" & specify a new "check point directory", the job doesn't work. The states don't get timed out after 10 minutes.

While debugging I noticed that my 'state' logic is indeed getting executed but states just don't time out - as they do when I use "latest". Any reason why?

Is this a known issue?

Note: I've tried this under Spark 2.3 & 2.4

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

Srinivas V
Cool. Are you not using watermark ?
Also, is it possible to start listening offsets from a specific date time ?

Regards
Srini

On Sat, Jun 27, 2020 at 6:12 AM Eric Beabes <[hidden email]> wrote:
My apologies...  After I set the 'maxOffsetsPerTrigger' to a value such as '200000' it started working. Hopefully this will help someone. Thanks.

On Fri, Jun 26, 2020 at 2:12 PM Something Something <[hidden email]> wrote:

My Spark Structured Streaming job works fine when I set "startingOffsets" to "latest". When I simply change it to "earliest" & specify a new "check point directory", the job doesn't work. The states don't get timed out after 10 minutes.

While debugging I noticed that my 'state' logic is indeed getting executed but states just don't time out - as they do when I use "latest". Any reason why?

Is this a known issue?

Note: I've tried this under Spark 2.3 & 2.4