How many tasks in the stage 2? How long do they take? If there are 200 tasks taking 1 second each (so many "rounds" of tasks on available cores taking 13 seconds), then you can reduce the number tasks by setting the sql conf spark.shuffle.partitions (defaults to 200). Given the number of cores in your cluster, you probably want to do 1-3 rounds of tasks, not more.
On Wed, Nov 28, 2018 at 2:28 PM Abhijeet Kumar <[hidden email]> wrote:
I’m doing a simple join. I’m running Spark on Yarn and performing a simple two streaming join.
DAG of my job
So, it’s taking around 13 secs to complete stage 2.