Structured Streaming Kafka change maxOffsetsPerTrigger won't apply

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Structured Streaming Kafka change maxOffsetsPerTrigger won't apply

Roland Johann
Hi All,

changing maxOffsetsPerTrigger and restarting the job won’t apply to the batch size. This is somehow bad as we currently use a trigger duration of 5minutes which consumes only 100k messages with an offset lag in the billions. Decreasing trigger duration affects also micro batch size - but its then only a few hundreds. Spark version in use is 2.4.4.

I assume that spark uses previous micro batch sizes and runtimes to somehow calculate current batch sizes based on trigger durations. AFAIK structured streaming isn’t back pressure aware, so this behavior is strange on multiple levels.

Any help appreciated.

Kind Regards
Roland
Reply | Threaded
Open this post in threaded view
|

Re: Structured Streaming Kafka change maxOffsetsPerTrigger won't apply

Gabor Somogyi
Hi Roland,

Not much shared apart from it's not working. Latest partition offset is used when the size of a TopicPartition is negative.
This can be found out by checking the following log entry in the logs:
logDebug(s"rateLimit $tp size is $size")
If you've double checked and still think it's an issue please file a jira and attach Spark configuration + logs.

BR,
G


On Wed, Nov 20, 2019 at 9:33 AM Roland Johann <[hidden email]> wrote:
Hi All,

changing maxOffsetsPerTrigger and restarting the job won’t apply to the batch size. This is somehow bad as we currently use a trigger duration of 5minutes which consumes only 100k messages with an offset lag in the billions. Decreasing trigger duration affects also micro batch size - but its then only a few hundreds. Spark version in use is 2.4.4.

I assume that spark uses previous micro batch sizes and runtimes to somehow calculate current batch sizes based on trigger durations. AFAIK structured streaming isn’t back pressure aware, so this behavior is strange on multiple levels.

Any help appreciated.

Kind Regards
Roland