[Spark on YARN] Asynchronously launching containers in YARN
I was recently doing some research into Spark on YARN's startup time and
observed slow, synchronous allocation of containers/executors. I am testing
on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN
was only allocating about 3 containers per second. Moreover when starting 3
Spark applications at the same time with each requesting 44 containers, the
first application would get all 44 requested containers and then the next
application would start getting containers and so on.
From looking at the code, it appears this is by design. There is an
undocumented configuration variable that will enable asynchronous allocation
of containers. I'm sure I'm missing something, but why is this not the
default? Is there a bug or race condition in this code path? I've done some
testing with it and it's been working and is significantly faster.
Here's the config:
I created a JIRA ticket in YARN's project, but I am curious if anyone else
has experience similar issues or have tested this configuration extensively.