I am working to move our system from Spark 2.1.0 to Spark 2.3.0. Our system is running on Spark managed via Yarn. During the course of the move I mirrored the settings to our new cluster. However, on the Spark 2.3.0 cluster with the same resource allocation I am seeing a number of executors die due to OOM:
18/07/16 17:23:06 ERROR YarnClusterScheduler: Lost executor 5 on wn80: Container killed by YARN for exceeding memory limits. 22.0 GB of 22 GB physical memory used. Consider boosting spark.yarn.executor.
I increased spark.driver.memoryOverhead and spark.executor.memoryOverhead from the default (384) to 2048. I went ahead and disabled vmem and pmem Yarn checks on the cluster. With that disabled I see the following error:
Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>
Looking at GC:
[Eden: 16.0M(8512.0M)->0.0B(8484.0M) Survivors: 4096.0K->4096.0K Heap: 8996.7M(20.0G)->8650.3M(20.0G)
This appears to show that we're seeing allocated heap of 7930 MB out of 20000 MB (so about half). However, Spark is throwing a Java OOM (out of heap space) error. I validated that we're not using legacy memory management mode. I've tried this against a few applications (some batch, some streaming). Has something changed with memory allocation in Spark 2.3.0 that would cause these issues?
Thank you for any help you can provide.
perhaps this is https://issues.apache.org/jira/browse/SPARK-24578?
that was reported as a performance issue, not OOMs, but its in the exact same part of the code and the change was to reduce the memory pressure significantly.
On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|