Compression during shuffle writes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Compression during shuffle writes

Bahubali Jain
Hi,
I have compressed data of size 500GB .I am repartitioning this data since the underlying data is very skewed and is causing a lot of issues for the downstream jobs.
During repartioning the shuffles writes are not getting compressed due to this I am running into disk space issues.Below is the screen shot which clearly depicts the issue(Input,shuffle write columns)
I have proactively set below parameters to true, but still it doesnt compress the intermediate shuffled data

spark.shuffle.compress
spark.shuffle.spill.compress

Inline image 1

I am using Spark 1.5 (for various unavoidable reasons!!)
Any suggestions would be greatly appreciated.

Thanks,
Baahu