Hive on S3 EMR moving temp files really slow

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Hive on S3 EMR moving temp files really slow

This post has NOT been accepted by the mailing list yet.
I have a spark application on emr 5.4
I am reading from an hdfs location and writing into a hive on s3 tabke.

 a set of temporary files are written  into .hive-staging  and then moves the small files to their final destination.

Dumping the small files into the hive-staging directory is rather fast; however moving the temporary files to their final destination is extremely slow. Painfully slow.

I am using s3 and not s3a because EMR recommends s3.

I know that Hive on S3 is a big use case so how are people using it?