Hive on S3 EMR moving temp files really slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Hive on S3 EMR moving temp files really slow

tafranky
This post has NOT been accepted by the mailing list yet.
I have a spark application on emr 5.4
I am reading from an hdfs location and writing into a hive on s3 tabke.

 a set of temporary files are written  into .hive-staging  and then moves the small files to their final destination.

Dumping the small files into the hive-staging directory is rather fast; however moving the temporary files to their final destination is extremely slow. Painfully slow.

I am using s3 and not s3a because EMR recommends s3.

I know that Hive on S3 is a big use case so how are people using it?
Loading...