mapPartitions versus map overhead?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

mapPartitions versus map overhead?

Huan Dao
Hi all, is there any overhead of mapPartitions versus overhead, if I implement an algorithm using map -> reduce versus mapPartitions -> reduce.
Thanks,
Huan Dao

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: mapPartitions versus map overhead?

Shao, Saisai
Hi Huan Dao,

Actually it is the same for map and mapPartitions if you do transformations like this:
a.map(r => r * 2)
a.mapPartitions(iter => iter.map(r => r *2))

these are iterator to iterator transformations.

But mapPartitions are more flexible than map, you can do transformation like: Iterator[A] => Iterator[B], where Iterator[B] can be anything iterable, there's no one to one mapping constraint. In short words, mapPartitions is quite like superset of map. You can check MappedRDD and MapPartitionsRDD to see the details.

Thanks
Jerry

-----Original Message-----
From: Huan Dao [mailto:[hidden email]]
Sent: Tuesday, December 24, 2013 1:15 PM
To: [hidden email]
Subject: mapPartitions versus map overhead?

Hi all, is there any overhead of mapPartitions versus overhead, if I implement an algorithm using map -> reduce versus mapPartitions -> reduce.
Thanks,
Huan Dao

Loading...