Spark event logging with s3a

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark event logging with s3a

David Hesson

We are trying to use spark event logging with s3a as a destination for event data.

 

We added these settings to the spark submits:

 

spark.eventLog.dir s3a://ourbucket/sparkHistoryServer/eventLogs

spark.eventLog.enabled true

 

Everything works fine with smaller jobs, and we can see the history data in the history server that’s also using s3a. However, when we tried a job with a few hundred gigs of data that goes through multiple stages, it was dying with OOM exception (same job works fine with spark.eventLog.enabled false)

 

18/10/22 23:07:22 ERROR util.Utils: uncaught error in thread SparkListenerBus, stopping SparkContext

java.lang.OutOfMemoryError

    at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)

    at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)

    at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)

 

Full stack trace: https://gist.github.com/davidhesson/bd64a25f04c6bb241ec398f5383d671c

 

Does anyone have any insight or experience with using spark history server with s3a? Is this problem being caused by perhaps something else in our configs? Any help would be appreciated.