This post has NOT been accepted by the mailing list yet.
Loading series of gz files from S3 Bucket
I am experimenting loading 30-40 gzipped files 2-3G in size from s3 and I am finding it considerably slower
than loading the same uncompressed text files from S3. Using sc.textFile to do the load should according to the 2.1.0 documentation be able to handle the set of compressed files. I have seen a few postings on addressing this , some pretty awkward. Does the driver program have to unzip the files sequentially
before it progresses. Are there any elegant solutions to this. Or is uncompressed the best way to go.