|
|
When my tuple type includes a generic type parameter, the pair RDD functions aren't available. Take for example the following (a join on two RDDs, taking the sum of the values):
def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) : RDD[(String, Int)] = { rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) } }
That works fine, but lets say I replace the type of the key with a generic type: def joinTest[K](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : RDD[(K, Int)] = {
rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) } }This latter function gets the compiler error "value join is not a member of org.apache.spark.rdd.RDD[(K, Int)]".
The reason is probably obvious, but I don't have much Scala experience. Can anyone explain what I'm doing wrong? -- Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning
|
|
import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD import scala.reflect.ClassTag
def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : RDD[(K, Int)] = {
rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) } }
|
|
That worked, thank you both! Thanks also Aaron for the list of things I need to read up on - I hadn't heard of ClassTag before.
|
|