No, the index referred to in mapWith (as well as in mapPartitionsWithIndex and several other RDD methods) is the index of the RDD's partitions. So, for example, in a typical case of an RDD read in from a distributed filesystem where the input file occupies n blocks, the index values in mapWith will range from 0 to n-1, since the default is for one RDD partition to be created for each file block.
On Tue, Dec 24, 2013 at 7:14 AM, Aureliano Buendia <[hidden email]> wrote:
Given a distributed file, does mapWith provide the functionality to know the index of each line (line number -1) across all worker nodes?
Can mapWith be used to treat index as a key when joining two RDD?