Yarn containers getting killed, error 52, multiple joins

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Yarn containers getting killed, error 52, multiple joins

This post has NOT been accepted by the mailing list yet.

I have a spark 1.6.2 app (tested previously in 2.0.0 as well). It is requiring a ton of memory (1.5TB) for a small dataset (~500mb). The memory usage seems to jump, when I loop through and inner join to make the dataset 12 times as wide. The app goes down during or after this loop, when I try to run a logistic regression on the generated dataframe. I'm using the scala API (2.10). Dynamic resource allocation is configured. Here are the parameters I'm using.

--master yarn-client --queue analyst --executor-cor    es 5 --executor-memory 40G --driver-memory 30G --conf spark.memory.fraction=0.75 --conf spark.yarn.executor.memoryOverhead=5120

Has anyone seen this or have an idea how to tune it? There is no way it should need so much memory.