PIG to SPARK

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

PIG to SPARK

suman bharadwaj
Hi,

How can i call pig script using SPARK. Can I use rdd.pipe() here ?

And can anyone share sample implementation of rdd.pipe () and if you can explain how rdd.pipe() works, it would really really help.

Regards,
SB
Reply | Threaded
Open this post in threaded view
|

Re: PIG to SPARK

Mayur Rustagi
The real question is why do you want to run pig script using Spark
Are you planning to user spark as underlying processing engine for Spark? thats not simple
Are you planning to feed Pig data to spark for further processing, then you can write it to HDFS & trigger your spark script. 

rdd.pipe is basically similar to Hadoop streaming, allowing you to run a script on each partition of the RDD & get output as another RDD. 
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257


On Wed, Mar 5, 2014 at 10:29 AM, suman bharadwaj <[hidden email]> wrote:
Hi,

How can i call pig script using SPARK. Can I use rdd.pipe() here ?

And can anyone share sample implementation of rdd.pipe () and if you can explain how rdd.pipe() works, it would really really help.

Regards,
SB

Reply | Threaded
Open this post in threaded view
|

Re: PIG to SPARK

suman bharadwaj
Thanks Mayur. I don't have clear idea on how pipe works wanted to understand more on it. But when do we use pipe() and how it works ?. Can you please share some sample code if you have ( even pseudo-code is fine ) ? It will really help.

Regards,
Suman Bharadwaj S


On Thu, Mar 6, 2014 at 3:46 AM, Mayur Rustagi <[hidden email]> wrote:
The real question is why do you want to run pig script using Spark
Are you planning to user spark as underlying processing engine for Spark? thats not simple
Are you planning to feed Pig data to spark for further processing, then you can write it to HDFS & trigger your spark script. 

rdd.pipe is basically similar to Hadoop streaming, allowing you to run a script on each partition of the RDD & get output as another RDD. 
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257


On Wed, Mar 5, 2014 at 10:29 AM, suman bharadwaj <[hidden email]> wrote:
Hi,

How can i call pig script using SPARK. Can I use rdd.pipe() here ?

And can anyone share sample implementation of rdd.pipe () and if you can explain how rdd.pipe() works, it would really really help.

Regards,
SB