gzipped vs unzipped from s3 bucket

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

gzipped vs unzipped from s3 bucket

This post has NOT been accepted by the mailing list yet.

Loading series of gz files from S3 Bucket

I am experimenting loading 30-40  gzipped files 2-3G in size from s3 and I am finding it considerably slower
than loading the same uncompressed text files from S3.   Using sc.textFile to do the load should according to the 2.1.0 documentation be able to handle the set of compressed files.  I have seen a few postings  on addressing this , some pretty awkward.  Does the driver program have to unzip the files sequentially
before it progresses. Are there any  elegant solutions to this.  Or is uncompressed the best way to go.  

thx in advance