list of tuples of data and location preferences (hostnames of Spark nodes)
Is that list a list of acceptable choices, and it will choose one of them? Or is it an ordered list? I'm trying to ascertain how well it will distribute if there's a lot of overlap between partitions and nodes.
In my particular case, my RDD is Seq of (filePath, hosts) where hosts are nodes on which the file's blocks are local.