This post has NOT been accepted by the mailing list yet.
I have a spark 1.6.2 app (tested previously in 2.0.0 as well). It is requiring a ton of memory (1.5TB) for a small dataset (~500mb). The memory usage seems to jump, when I loop through and inner join to make the dataset 12 times as wide. The app goes down during or after this loop, when I try to run a logistic regression on the generated dataframe. I'm using the scala API (2.10). Dynamic resource allocation is configured. Here are the parameters I'm using.