Dynamic Allocation not removing executors

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Dynamic Allocation not removing executors

Maximiliano Patricio Méndez
Hi,

I found an issue trying to use dynamic allocation in 2.3.1 where the driver does not remove idle executors under some circunstances.

For the first instance of this happening, it seems that a change introduced in 2.2.1/2.3.0 (SPARK-21656) added a check on the ExecutorAllocationManager that causes the first remove request to be ignored if there are no pending tasks and the initialExecutors property is set != 0 (the initializing flag prevents the numExecutorsTarget number to be changed)

My dynamic allocation conf:
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.initialExecutors 4
spark.dynamicAllocation.minExecutors 0
spark.dynamicAllocation.maxExecutors 100

This normalizes after the first submitted job, but may leave up to 4 executors (in our case) idle without being remove if no job is ever submitted.

Logs:
18/08/15 13:08:44 DEBUG ExecutorAllocationManager: Starting idle timer for 3 because there are no more tasks scheduled to run on the executor (to expire in 60 seconds)
18/08/15 13:08:44 INFO ExecutorAllocationManager: New executor 3 has registered (new total is 1)
18/08/15 13:08:45 DEBUG ExecutorAllocationManager: Starting idle timer for 0 because there are no more tasks scheduled to run on the executor (to expire in 60 seconds)
18/08/15 13:08:45 INFO ExecutorAllocationManager: New executor 0 has registered (new total is 2)
18/08/15 13:08:45 DEBUG ExecutorAllocationManager: Starting idle timer for 1 because there are no more tasks scheduled to run on the executor (to expire in 60 seconds)
18/08/15 13:08:45 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 3)
18/08/15 13:08:46 DEBUG ExecutorAllocationManager: Starting idle timer for 2 because there are no more tasks scheduled to run on the executor (to expire in 60 seconds)
18/08/15 13:08:46 INFO ExecutorAllocationManager: New executor 2 has registered (new total is 4)
18/08/15 13:09:44 INFO ExecutorAllocationManager: Request to remove executorIds: 3
18/08/15 13:09:44 DEBUG ExecutorAllocationManager: Not removing idle executor 3 because there are only 4 executor(s) left (number of executor target 4)
18/08/15 13:09:45 DEBUG ExecutorAllocationManager: Lowering target number of executors to 0 (previously 4) because not all requested executors are actually needed
18/08/15 13:09:45 INFO ExecutorAllocationManager: Request to remove executorIds: 0
18/08/15 13:09:45 INFO ExecutorAllocationManager: Removing executor 0 because it has been idle for 60 seconds (new desired total will be 3)
18/08/15 13:09:45 INFO ExecutorAllocationManager: Request to remove executorIds: 1
18/08/15 13:09:45 INFO ExecutorAllocationManager: Removing executor 1 because it has been idle for 60 seconds (new desired total will be 2)
18/08/15 13:09:46 INFO ExecutorAllocationManager: Existing executor 0 has been removed (new total is 3)
18/08/15 13:09:46 DEBUG ExecutorAllocationManager: Executor 0 is no longer pending to be removed (1 left)
18/08/15 13:09:46 INFO ExecutorAllocationManager: Request to remove executorIds: 2
18/08/15 13:09:46 INFO ExecutorAllocationManager: Removing executor 2 because it has been idle for 60 seconds (new desired total will be 1)
18/08/15 13:09:46 INFO ExecutorAllocationManager: Existing executor 1 has been removed (new total is 2)
18/08/15 13:09:46 DEBUG ExecutorAllocationManager: Executor 1 is no longer pending to be removed (1 left)
18/08/15 13:09:46 INFO ExecutorAllocationManager: Existing executor 2 has been removed (new total is 1)
18/08/15 13:09:46 DEBUG ExecutorAllocationManager: Executor 2 is no longer pending to be removed (0 left)