Round Robin Partitioner

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Round Robin Partitioner

David Thomas
Is it possible to parition the RDD elements in a round robin fashion? Say I have 5 nodes in the cluster and 5 elements in the RDD. I need to ensure each element gets mapped to each node in the cluster.
Reply | Threaded
Open this post in threaded view
|

Re: Round Robin Partitioner

Patrick Wendell
In Spark 1.0 we've added better randomization to the scheduling of
tasks so they are distributed more evenly by default.

https://github.com/apache/spark/commit/556c56689bbc32c6cec0d07b57bd3ec73ceb243e

However having specific policies like that isn't really supported
unless you subclass the RDD itself and override getPreferredLocations.
Keep in mind this is tricky because the set of executors might change
during the lifetime of a Spark job.

- Patrick

On Thu, Mar 13, 2014 at 11:50 AM, David Thomas <[hidden email]> wrote:
> Is it possible to parition the RDD elements in a round robin fashion? Say I
> have 5 nodes in the cluster and 5 elements in the RDD. I need to ensure each
> element gets mapped to each node in the cluster.