Boosting executorMemory vs. executorMemoryOverhead
Dear Spark Users,
Hope you are staying safe in these times of COVID.
I’m writing today to ask a question that has been danced around quite a bit on a combination of StackOverflow, random blog posts, and other technical forums.
When does one boost executorMemory vs. executorMemoryOverhead in Java and Python?
We are interested to know of any detailed resources you may have that identify what goes into overhead vs. main memory. We’re aware of the current documentation but have some observations that indicate more may being put into overhead than we’re led to believe.
For the specifically Python case, what would go into spark.executor.pyspark.memory vs. spark.executor.memoryOverhead?
We’re happy to refer to any guidance you may have on hand, but after searching over the Spark JIRA as well as StackOverflow posts and other resources, this remains a bit of a mystery given our observations.
I can think of many an occasion we observe a FetchFailedException and aren’t entirely certain whether to bump memoryOverhead, main memory, or PySpark memory
specifically in the case of Python.
// Specific evidence, if curious:
We have a job that fails when it parses very long string columns of a DataFrame even when main memory is boosted to 27GB and overhead is boosted to 3GB. However, when executorMemoryOverhead is boosted to 8GB and main memory remains at 6GB, the job succeeds.
My own understanding of Java in Spark is that its UDF and standard SQL-like operations would request memory from main memory,
not from memoryOverhead, but this doesn’t seem to be the case as reported by the user. In the specific code they ran,
there was no UDF and it was purely using SQL functions from Java. Again, this job was only fixed by bumping overhead.
We’re aware of the need to boost PySpark memory / memoryOverhead when utilizing Python, but I wasn’t aware of this being the case in Java, especially given the bare-bones SQL nature of their job.
My current leanings are that anything not directly operating in Catalyst will fall back to memoryOverhead, in which case we would need to budget accordingly, but this is worrisome that even standard operations appear to need a boosted memoryOverhead.