Past batch time in Spark Streaming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Past batch time in Spark Streaming


When my Spark stream job starts, it doesn't able to process all the data in batch duration then back-pressure kicks in which reduces the batch size.  


This is not a problem. Event after couple hours when it processes all the data from Kafka stream (and suppose no data is further getting produced to Kafka), still batch time is showing past time.  

But if any data comes, then it able to process then and there. For example, I published one event in Kafka around 3PM but it processed in batch of Batch Time : 2020/02/20 13:37:30

My question is what the "Batch Time" in Spark UI. And why its showing past time when it has current produced event. And how its different from Submited Time

Spark config 

"spark.shuffle.service.enabled", "true"
"spark.streaming.receiver.maxRate", "10000"
"spark.streaming.kafka.maxRatePerPartition", "600"
"spark.streaming.backpressure.enabled", "true"
"spark.streaming.backpressure.initialRate", "10000"
"spark.streaming.blockInterval", "100ms"
"spark.executor.extraJavaOptions", "-XX:+UseConcMarkSweepGC"