Avoiding collect but use foreach

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding collect but use foreach

Aakash Basu-2
Hi,

This:


to_list = [list(row) for row in df.collect()]


Gives:


[[5, 1, 1, 1, 2, 1, 3, 1, 1, 0], [5, 4, 4, 5, 7, 10, 3, 2, 1, 0], [3, 1, 1, 1, 2, 2, 3, 1, 1, 0], [6, 8, 8, 1, 3, 4, 3, 7, 1, 0], [4, 1, 1, 3, 2, 1, 3, 1, 1, 0]]


I want to avoid collect operation, but still convert the dataframe to a python list of list just as above for downstream operations.


Is there a way, I can do it, maybe a better performant code that using collect?


Thanks,

Aakash.

Reply | Threaded
Open this post in threaded view
|

Re: Avoiding collect but use foreach

刘虓
hi,
I think you can make your python code into an udf and call udf in foreachpartition.

Aakash Basu <[hidden email]> 于2019年2月1日周五 下午3:37写道:
Hi,

This:


to_list = [list(row) for row in df.collect()]


Gives:


[[5, 1, 1, 1, 2, 1, 3, 1, 1, 0], [5, 4, 4, 5, 7, 10, 3, 2, 1, 0], [3, 1, 1, 1, 2, 2, 3, 1, 1, 0], [6, 8, 8, 1, 3, 4, 3, 7, 1, 0], [4, 1, 1, 3, 2, 1, 3, 1, 1, 0]]


I want to avoid collect operation, but still convert the dataframe to a python list of list just as above for downstream operations.


Is there a way, I can do it, maybe a better performant code that using collect?


Thanks,

Aakash.