Fwd: Spark on EMR suddenly stalling

Fwd: Spark on EMR suddenly stalling

Jeroen Miller

Just a quick update as I did not made much progress yet.

On 28 Dec 2017, at 21:09, Gourav Sengupta <[hidden email]> wrote:
> can you try to then use the EMR version 5.10 instead or EMR version 5.11 instead?

Same issue with EMR 5.11.0. Task 0 in one stage never finishes.

> can you please try selecting a subnet which is in a different availability zone?

I did not try this yet. But why should that make a difference?

> if possible just try to increase the number of task instances and see the difference?

I tried with 512 partitions -- no difference.

> also in case you are using caching,

No caching used.

> Also can you please report the number of containers that your job is creating by looking at the metrics in the EMR console?

8 containers if I trust the directories in j-xxx/containers/application_xxx/.

> Also if you see the spark UI then you can easily see which particular step is taking the longest period of time - you just have to drill in a bit in order to see that. Generally in case shuffling is an issue then it definitely appears in the SPARK UI as I drill into the steps and see which particular one is taking the longest.

I always have issues with the Spark UI on EC2 -- it never seems to be up to date.


