Re: Why do checkpoints work the way they do?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Why do checkpoints work the way they do?

Dmitry Naumenko
+1 for me for this question. If there any constraints in restoring checkpoint for Structured Streaming, they should be documented.


2017-08-31 9:20 GMT+03:00 张万新 <[hidden email]>:
So is there any documents demonstrating in what condition can my application recover from the same checkpoint and in what condition not?

Tathagata Das <[hidden email]>于2017年8月30日周三 下午1:20写道:
Hello, 

This is an unfortunate design on my part when I was building DStreams :) 

Fortunately, we learnt from our mistakes and built Structured Streaming the correct way. Checkpointing in Structured Streaming stores only the progress information (offsets, etc.), and the user can change their application code (within certain constraints, of course) and still restart from checkpoints (unlike DStreams). If you are just building out your streaming applications, then I highly recommend you to try out Structured Streaming instead of DStreams (which is effectively in maintenance mode).


On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald <[hidden email]> wrote:
Hello,

I am implementing a spark streaming solution with Kafka and read that checkpoints cannot be used across application code changes - here

I tested changes in application code and got the error message as b below - 

17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from file file:/tmp/checkpoint/checkpoint-1503641160000.bk
java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer; local class incompatible: stream classdesc serialVersionUID = -2927962711774871866, local class serialVersionUID = 1529165946227428979 

While I understand that this is as per design, can I know why does checkpointing work the way that it does verifying the class signatures? Would it not be easier to let the developer decide if he/she wants to use the old checkpoints depending on what is the change in application logic e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc

This is first post in the group. Apologies if I am asking the question again, I did a nabble search and it didnt throw up the answer.

Thanks for the help.
Hugo


Reply | Threaded
Open this post in threaded view
|

Re: Why do checkpoints work the way they do?

Hugo Reinwald
Thanks Tathagata for the clarification. +1 on documenting limitations of checkpoints on structured streaming.

Hugo

On Mon, Sep 11, 2017 at 7:13 PM, Dmitry Naumenko <[hidden email]> wrote:
+1 for me for this question. If there any constraints in restoring checkpoint for Structured Streaming, they should be documented.


2017-08-31 9:20 GMT+03:00 张万新 <[hidden email]>:
So is there any documents demonstrating in what condition can my application recover from the same checkpoint and in what condition not?

Tathagata Das <[hidden email]>于2017年8月30日周三 下午1:20写道:
Hello, 

This is an unfortunate design on my part when I was building DStreams :) 

Fortunately, we learnt from our mistakes and built Structured Streaming the correct way. Checkpointing in Structured Streaming stores only the progress information (offsets, etc.), and the user can change their application code (within certain constraints, of course) and still restart from checkpoints (unlike DStreams). If you are just building out your streaming applications, then I highly recommend you to try out Structured Streaming instead of DStreams (which is effectively in maintenance mode).


On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald <[hidden email]> wrote:
Hello,

I am implementing a spark streaming solution with Kafka and read that checkpoints cannot be used across application code changes - here

I tested changes in application code and got the error message as b below - 

17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from file file:/tmp/checkpoint/checkpoint-1503641160000.bk
java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer; local class incompatible: stream classdesc serialVersionUID = -2927962711774871866, local class serialVersionUID = 1529165946227428979 

While I understand that this is as per design, can I know why does checkpointing work the way that it does verifying the class signatures? Would it not be easier to let the developer decide if he/she wants to use the old checkpoints depending on what is the change in application logic e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc

This is first post in the group. Apologies if I am asking the question again, I did a nabble search and it didnt throw up the answer.

Thanks for the help.
Hugo