mapWith and array index as key

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

mapWith and array index as key

Aureliano Buendia
Hi,

Given a distributed file, does mapWith provide the functionality to know the index of each line (line number -1) across all worker nodes?

Can mapWith be used to treat index as a key when joining two RDD?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapWith and array index as key

Mark Hamstra
No, the index referred to in mapWith (as well as in mapPartitionsWithIndex and several other RDD methods) is the index of the RDD's partitions.  So, for example, in a typical case of an RDD read in from a distributed filesystem where the input file occupies n blocks, the index values in mapWith will range from 0 to n-1, since the default is for one RDD partition to be created for each file block.


On Tue, Dec 24, 2013 at 7:14 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Given a distributed file, does mapWith provide the functionality to know the index of each line (line number -1) across all worker nodes?

Can mapWith be used to treat index as a key when joining two RDD?

Loading...