In my structured streaming job I've noticed that a LOT of data keeps going to one executor whereas other executors don't process that much data. As a result, tasks on that executor take a lot of time to complete. In other words, the distribution is skewed.
I believe in Structured streaming the Partitions in the input Kafka topic get evenly distributed amongst exectors, right? In our input Kafka topic the data is fairly evenly distributed amongst partitions - I would think. Any reason for this skew? Is there a way to fix it by using a Partitioner or something like that? Please let me know.