Apache Spark - Structured Streaming from file - checkpointing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Spark - Structured Streaming from file - checkpointing

M Singh
Hi:

I am using spark structured streaming (v 2.2.0) to read data from files. I have configured checkpoint location. On stopping and restarting the application, it looks like it is reading the previously ingested files.  Is that expected behavior ?  

Is there anyway to prevent reading files that have already been ingested ? 
If a file is partially ingested, on restart - can we start reading the file from previously checkpointed offset ?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark - Structured Streaming from file - checkpointing

Diogo Munaro Vieira
Can you please post here your code?

Em 25 de dez de 2017 19:24, "M Singh" <[hidden email]> escreveu:
Hi:

I am using spark structured streaming (v 2.2.0) to read data from files. I have configured checkpoint location. On stopping and restarting the application, it looks like it is reading the previously ingested files.  Is that expected behavior ?  

Is there anyway to prevent reading files that have already been ingested ? 
If a file is partially ingested, on restart - can we start reading the file from previously checkpointed offset ?

Thanks