I'm using barrier execution in my spark job but am occasionally seeing
deadlocks where the task scheduler is unable to place all the tasks. The
failure is logged but the job hangs indefinitely. I have 2 executors with 16
cores each, using standalone mode (I think? I'm using databricks). The
dataset has 31 partitions.
One thing I've noticed when this occurs is that the number of "Active Tasks"
exceeds the number of cores on one executor. How is the executor able to
start more tasks than it has cores? spark.executor.cores is not set.