using RDD result in another TDD

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

using RDD result in another TDD

amoc

Hi

I’d like to use the result of one RDD1 in another RDD2. Normally I would use something like a barrier so make the 2nd RDD wait till the computation of the 1st RDD is done then include the result from RDD1 in the closure for RDD2.

Currently I create another RDD, RDD3, out of the result of RDD1 then do Cartesian product on RDD2 and RDD3. NB: This operation is slow and expands partitions from 270 to 1200

 

This is a simplified example but I think it should help:

What I want to do (pseudocode):

   val a:Int=RDD1.reduce(..)

   RDD2.map(x => x*a)

 

What I use right now (pseudocode):

  val a:Int=RDD1.reduce(..)

  RDD3=makeRDD(a)

   RDD2.cartesianProduct(RDD3)

 

How to structure this type of operation to not need the barrier to block computing RDD2 until RDD1 is done?

 

-Adrian

 

Reply | Threaded
Open this post in threaded view
|

Re: using RDD result in another TDD

sowen
You can't use RDDs inside of RDDs, so this won't work anyway. You could collect the result of RDD1 and broadcast it, perhaps. collect() blocks.

On Wed, Nov 12, 2014 at 6:41 PM, Adrian Mocanu <[hidden email]> wrote:

Hi

I’d like to use the result of one RDD1 in another RDD2. Normally I would use something like a barrier so make the 2nd RDD wait till the computation of the 1st RDD is done then include the result from RDD1 in the closure for RDD2.

Currently I create another RDD, RDD3, out of the result of RDD1 then do Cartesian product on RDD2 and RDD3. NB: This operation is slow and expands partitions from 270 to 1200

 

This is a simplified example but I think it should help:

What I want to do (pseudocode):

   val a:Int=RDD1.reduce(..)

   RDD2.map(x => x*a)

 

What I use right now (pseudocode):

  val a:Int=RDD1.reduce(..)

  RDD3=makeRDD(a)

   RDD2.cartesianProduct(RDD3)

 

How to structure this type of operation to not need the barrier to block computing RDD2 until RDD1 is done?

 

-Adrian