Load distribution in Structured Streaming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Load distribution in Structured Streaming

Eric Beabes
In my structured streaming job I've noticed that a LOT of data keeps going to one executor whereas other executors don't process that much data. As a result, tasks on that executor take a lot of time to complete. In other words, the distribution is skewed. 

I believe in Structured streaming the Partitions in the input Kafka topic get evenly distributed amongst exectors, right? In our input Kafka topic the data is fairly evenly distributed amongst partitions - I would think. Any reason for this skew? Is there a way to fix it by using a Partitioner or something like that? Please let me know.

Thanks in advance for the help.