I encountered the problem of "insufficient memory". The error is logged in the file with a name " hs_err_pid86252.log"(attached in the end of this email).
I launched the spark job by " spark-submit --driver-memory 40g --master yarn --deploy-mode client". The spark session was created with 10 executors each with 60g memory. The data access pattern is pretty simple, I keep reading some spark dataframe from hdfs one by one, filter, join with another dataframe, and then append the results to an dataframe:
for i= 1,2,3....
df1 = spark.read.parquet(file_i)
df_r = df1.filter(...). join(df2)
df_all = df_all.union(df_r)
each file_i is quite small, only a few GB, but there are a lot of such files. after filtering and join, each df_r is also quite small. When the program failed, df_all had only 10k rows which should be around 10GB. Each machine in the cluster has round 80GB memory and 1TB disk space and only one user was using the cluster when it failed due to insufficient memory. My questions are:
i). The log file showed that it failed to allocate 8G committing memory. But how could that happen when the driver and executors have more than 40g free memory. In fact, only transformations but no actions had run when the program failed. As I understand, only DAG and book-keeping work is done during dataframe transformation, no data is brought into the memory. Why spark still tries to allocate such large memory?
ii). Could manually running garbage collection help?
iii). Did I mis-specify some runtime parameter for jvm, yarn, or spark?
Any help or references are appreciated!
The content of hs_err_pid86252,log:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8663334912 bytes(~8G) for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
# Out of Memory Error (os_linux.cpp:2643), pid=86252, tid=0x00007fd69e683700
# JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 )
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
--------------- T H R E A D ---------------
Current thread (0x00007fe0bc08c000): VMThread [stack: 0x00007fd69e583000,0x00007fd69e684000] [id=86295]
Try specifying executor memory.
On Tue, Mar 13, 2018 at 5:15 PM, Shiyuan <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|