Reading Hive tables Parallel in Spark

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Reading Hive tables Parallel in Spark

This post has NOT been accepted by the mailing list yet.
I am currently trying to parallelize reading multiple tables from Hive . As part of an archival framework, i need to convert few hundred tables which are in txt format to Parquet. For now i am calling a Spark SQL inside a for loop for conversion. But this execute sequential and entire process takes longer time to finish.

I tired  submitting 4 different Spark jobs ( each having set of tables to be converted), it did give me some parallelism , but i would like to do this in single Spark job due to few limitation of our cluster and process

Any help will be greatly appreciated