Regarding structured streaming windows on older data

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Regarding structured streaming windows on older data

Hemant Bhanawat
For demonstration purpose, I was using data that had older timestamps with structured streaming. The data was for the year 2018, window was of 24 hours and watermark of 0 seconds. Few things that I saw and could not explain are:
1. The initial batch of streaming had around 60 windows. It processed all but the last one.
2. The data for a window is not sent to the writer immediately.
3. If I ingest data for 2019 in the midway, it is not processed. In fact, spark didnt output the 2019 data at all.    

Can someone point me to some doc or explanation on how the structured streaming works with data that has non current timestamps?

Thanks in advance,
Hemant