I am running a spark application on a yarn cluster using the yarn client. However I have noticed huge memory overheads.
For example, I request 1g of master memory, and 2 workers with 1g of worker memory. I would expect the memory footprint in yarn to be about 3g, however, the resource manager shows 6g being used! Furthermore, the spark web ui shows around 1.8g of available memory, so in the end we are getting 1.8g of available cache for 6g of yarn memory, which is unacceptable.
After some experimenting, it seems that each yarn container has about 1g of overhead, which means 1g overhead per worker.
Do I have any problems with configurations? What could cause this issue?