Bulk load to HBase

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bulk load to HBase

Pradeep-2
We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2.

We have large volume of data that we bulk load to HBase using import tsv. Map Reduce job is very slow and looking for options we can use spark to improve performance. Please let me know if this can be optimized with spark and what packages or libs can be used.

PM

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Bulk load to HBase

Jörn Franke
Before you look at any new library/tool:
What is the process of importing, what is the original file format, file size, compression etc . once you have investigated this you can start improving it. Then, as a last step a new framework can be explored.
Feel free to share those and we can help you better.
BTW if you need to use Spark then go for 2.x - it is also available in HDP.

> On 22. Oct 2017, at 10:20, Pradeep <[hidden email]> wrote:
>
> We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2.
>
> We have large volume of data that we bulk load to HBase using import tsv. Map Reduce job is very slow and looking for options we can use spark to improve performance. Please let me know if this can be optimized with spark and what packages or libs can be used.
>
> PM
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]