Latency between Batch Completion and triggering of onBatchCompleted() event

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Latency between Batch Completion and triggering of onBatchCompleted() event

rahul patwari
Hi,

Context:
We are using the Spark Streaming Library.

We have created a StreamingListener to implement some logic when onBatchCompleted() event is triggered. This StreamingListener is registered with the StreamingContext.

We are using Spark on Kubernetes. The Spark version is 2.4.2. batchDuration is 500ms. Checkpointing is enabled.

Issue:
We are observing 3 to 5 seconds latency between the time when a batch is completed and the time when onBatchCompleted() event is triggered. This is happening for every batch.

We have measured the batch completion time from Spark Driver UI and also from the Driver logs.

An example with 5 second latency:
if a batch is completed at 10:25:30(HH:MM:SS), the approximate time at which onBatchCompleted() event for this batch gets triggered is 10:25:35

We were expecting a sub-second latency between the time when a batch gets completed and the time when onBatchCompleted() event for the batch gets triggered.

Did anyone face this issue before? 

What factors can contribute to this latency?  

Thanks for any pointers for debugging the issue.

Regards,
Rahul
Reply | Threaded
Open this post in threaded view
|

Re: Latency between Batch Completion and triggering of onBatchCompleted() event

rahul patwari
Update:

As the Pipeline progresses, the delay between the time when the batch is completed and the time when onBatchCompleted() event is getting triggered for the batch is increasing i.e. we are not observing a constant delay of 3 to 5 seconds for every batch.

Trimmed Logs which show the issue in action:
11:37:06: INFO: scheduler.JobScheduler: "Finished job streaming job 1584099237500 ms.4 from job set of time 1584099237500 ms"
11:37:06: INFO: scheduler.JobScheduler: "Total delay: 189.352 s for time 1584099237500 ms (execution: 0.942 s)"
11:40:30: INFO: util.GlobalWatermarkHolder$WatermarkAdvancingStreamingListener: "Batch with timestamp: 1584099237500 has completed, watermarks have been updated."

util.GlobalWatermarkHolder$WatermarkAdvancingStreamingListener is the listener which is registered with Streaming Context.
The log: "Batch with timestamp: 1584099237500 has completed, watermarks have been updated." is present inside onbatchCompleted() method.

As you can see from the above logs, there is 3 minutes 24 seconds delay between the time when the batch is completed and the time when onBatchCompleted() event for the batch is getting triggered.


On Thu, Mar 19, 2020 at 7:44 PM rahul patwari <[hidden email]> wrote:
Hi,

Context:
We are using the Spark Streaming Library.

We have created a StreamingListener to implement some logic when onBatchCompleted() event is triggered. This StreamingListener is registered with the StreamingContext.

We are using Spark on Kubernetes. The Spark version is 2.4.2. batchDuration is 500ms. Checkpointing is enabled.

Issue:
We are observing 3 to 5 seconds latency between the time when a batch is completed and the time when onBatchCompleted() event is triggered. This is happening for every batch.

We have measured the batch completion time from Spark Driver UI and also from the Driver logs.

An example with 5 second latency:
if a batch is completed at 10:25:30(HH:MM:SS), the approximate time at which onBatchCompleted() event for this batch gets triggered is 10:25:35

We were expecting a sub-second latency between the time when a batch gets completed and the time when onBatchCompleted() event for the batch gets triggered.

Did anyone face this issue before? 

What factors can contribute to this latency?  

Thanks for any pointers for debugging the issue.

Regards,
Rahul