[YarnShuffleService] Consistent OOMs when enabling Spark transport encryption

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[YarnShuffleService] Consistent OOMs when enabling Spark transport encryption

Anton Ippolitov

I have been experimenting with Spark 2.4.4 transport encryption and have
encountered an issue with a couple of our jobs: they consistently make the
YarnShuffleService die with OOM errors. It looks like the memory is full of
/io.netty.channel.ChannelOutboundBuffer$Entry/ objects each containing
/org.apache.spark.network.crypto.TransportCipher$EncryptedMessage/ objects.

Someone else has created a Jira ticket (SPARK-28743) for this a couple of
months ago but there was no answer. I have shared more details in the ticket
itself: https://issues.apache.org/jira/browse/SPARK-28743

- Has anyone experienced anything like this as well?
- Is there anyone with knowledge of the YarnShuffleService, and/or Netty
and/or Spark encryption that would be able to lend a hand in debugging this


Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

To unsubscribe e-mail: [hidden email]