Spark MakeRDD preferred workers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark MakeRDD preferred workers

Christopher Piggott

def makeRDD[T](seq: Seq[(T, Seq[String])])(implicit arg0: ClassTag[T]): RDD[T]
    list of tuples of data and location preferences (hostnames of Spark nodes)

Is that list a list of acceptable choices, and it will choose one of them?  Or is it an ordered list?  I'm trying to ascertain how well it will distribute if there's a lot of overlap between partitions and nodes.

In my particular case, my RDD is Seq of  (filePath, hosts[])  where hosts are nodes on which the file's blocks are local.