This happened because they were integers equal to 0 mod 5, and we used the default hashCode implementation for integers, which will map them all to 0. There’s no API method that will look at the resulting partition sizes and rebalance them, but you could use another hash function.

Matei

On Mar 24, 2014, at 5:20 PM, Walrus theCat <

[hidden email]> wrote:

> Hi,

>

> sc.parallelize(Array.tabulate(100)(i=>i)).filter( _ % 20 == 0 ).coalesce(5,true).glom.collect yields

>

> Array[Array[Int]] = Array(Array(0, 20, 40, 60, 80), Array(), Array(), Array(), Array())

>

> How do I get something more like:

>

> Array(Array(0), Array(20), Array(40), Array(60), Array(80))

>

> Thanks