Regression of external shuffle service spark 2.3 vs spark 2.2

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Regression of external shuffle service spark 2.3 vs spark 2.2

igor.berman
Hi,
any inputs will be welcome regarding below
We are running with external shuffle service. Mesos cluster(1.5.1)

After upgrading our production workload to spark 2.3 we started to see OOM
failures of external shuffle services(running on each node).

Does anybody experienced same problems?
Any direction to any code would be helpful(I know that there was work done
in external shuffle service domain under 2.3, but from reading PRs can't
pinpoint what change causing those OOM)

Unfortunately there is no test case for reproduction and even with 2.3, OOM
failures start after 2+ days of production load

Igor



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]