If I applied a RangePartitioner to this set of data, say val rangePart = new RangePartitioner(4, myDataRDD) and then repartitioned the data, would I be able to get back 4 equally distributed partitions where Key=8 would be split across multiple partitions, or would all the 8 keys end up in one partition?
If this isn't possible, then is there some other partitioner that I could evenly distribute this dataset evenly? The reason I'd like them to be evenly distributed is because I am feeding this RDD into aggregateByKey() and I would like to reduce the data skew as the partitions are written out.
Also, does myDataRDD need to be sorted in order to correctly create the range partitioner? My research shows this may be the case.