confusion on RDD usage in MatrixFactorizationModel (master branch)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

confusion on RDD usage in MatrixFactorizationModel (master branch)

Nan Zhu
Hi, all

I’m reading the source code of master branch 

there is a new predict() function in MatrixFactorizationModel

/**
    * Predict the rating of many users for many products.
    * The output RDD has an element per each element in the input RDD (including all duplicates)
    * unless a user or product is missing in the training set.
    *
    * @param usersProducts  RDD of (user, product) pairs.
    * @return RDD of Ratings.
    */
  def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
    val users = userFeatures.join(usersProducts).map{
      case (user, (uFeatures, product)) => (product, (user, uFeatures))
    }
    users.join(productFeatures).map {
      case (product, ((user, uFeatures), pFeatures)) =>
        val userVector = new DoubleMatrix(uFeatures)
        val productVector = new DoubleMatrix(pFeatures)
        Rating(user, product, userVector.dot(productVector))
    }
  }

it seems that the author can directly call join with a RDD object? 

It’s a new feature in next version? I’m used to creating a PairRDDFunctions with the current RDD and then calls join, etc.

Did I misunderstand something?

Best,

-- 
Nan Zhu

Reply | Threaded
Open this post in threaded view
|

Re: confusion on RDD usage in MatrixFactorizationModel (master branch)

Nan Zhu
ignore that

These operations are
 * automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit
 * conversions when you `import org.apache.spark.SparkContext._`.

-- 
Nan Zhu


On Wednesday, January 8, 2014 at 10:38 AM, Nan Zhu wrote:

Hi, all

I’m reading the source code of master branch 

there is a new predict() function in MatrixFactorizationModel

/**
    * Predict the rating of many users for many products.
    * The output RDD has an element per each element in the input RDD (including all duplicates)
    * unless a user or product is missing in the training set.
    *
    * @param usersProducts  RDD of (user, product) pairs.
    * @return RDD of Ratings.
    */
  def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
    val users = userFeatures.join(usersProducts).map{
      case (user, (uFeatures, product)) => (product, (user, uFeatures))
    }
    users.join(productFeatures).map {
      case (product, ((user, uFeatures), pFeatures)) =>
        val userVector = new DoubleMatrix(uFeatures)
        val productVector = new DoubleMatrix(pFeatures)
        Rating(user, product, userVector.dot(productVector))
    }
  }

it seems that the author can directly call join with a RDD object? 

It’s a new feature in next version? I’m used to creating a PairRDDFunctions with the current RDD and then calls join, etc.

Did I misunderstand something?

Best,

-- 
Nan Zhu