How to use parallelize feature with newAPIHadoopRDD?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to use parallelize feature with newAPIHadoopRDD?

buremba
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: How to use parallelize feature with newAPIHadoopRDD?

buremba
This post has NOT been accepted by the mailing list yet.
I tried to use map each of them by returning the key that I want to use as join key. Then join them and use foreach to be able to get the query results. (Here is the gist: https://gist.github.com/buremba/9919584)
However since I had to use map before joining column families I'm not sure whether this is a efficient way to do this operation or not. Do you have any suggestion?