Kafka backlog - spark structured streaming

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Kafka backlog - spark structured streaming

Kailash Kalahasti
Is there any way to find out backlog on kafka topic while using spark structured streaming ? I checked few consumer apis but that requires to enable groupid for streaming, but seems it is not allowed.

Basically i want to know number of records waiting to be processed.

Any suggestions ? 
Reply | Threaded
Open this post in threaded view
|

Re: Kafka backlog - spark structured streaming

Burak Yavuz-2
If you don't set rate limiting through `maxOffsetsPerTrigger`, Structured Streaming will always process until the end of the stream. So number of records waiting to be processed should be 0 at the start of each trigger.

On Mon, Jul 30, 2018 at 8:03 AM, Kailash Kalahasti <[hidden email]> wrote:
Is there any way to find out backlog on kafka topic while using spark structured streaming ? I checked few consumer apis but that requires to enable groupid for streaming, but seems it is not allowed.

Basically i want to know number of records waiting to be processed.

Any suggestions ? 

Reply | Threaded
Open this post in threaded view
|

Re: Kafka backlog - spark structured streaming

Arun Mahadevan
Heres a proposal to a add - https://github.com/apache/spark/pull/21819

Its always good to set "maxOffsetsPerTrigger" unless you want spark to process till the end of the stream in each micro batch. Even without "maxOffsetsPerTrigger" the lag can be non-zero by the time the micro batch completes.

On 30 July 2018 at 08:50, Burak Yavuz <[hidden email]> wrote:
If you don't set rate limiting through `maxOffsetsPerTrigger`, Structured Streaming will always process until the end of the stream. So number of records waiting to be processed should be 0 at the start of each trigger.

On Mon, Jul 30, 2018 at 8:03 AM, Kailash Kalahasti <[hidden email]> wrote:
Is there any way to find out backlog on kafka topic while using spark structured streaming ? I checked few consumer apis but that requires to enable groupid for streaming, but seems it is not allowed.

Basically i want to know number of records waiting to be processed.

Any suggestions ?