Alternatives for dataframe collectAsList()

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Alternatives for dataframe collectAsList()
This post has NOT been accepted by the mailing list yet.

after I created a dataset

Dataset<Row> df = sqlContext.sql("query");

I need to have a result values and I call a method: collectAsList()

List<Row> list = df.collectAsList();

But it's very slow, if I work with large datasets (20-30 million records). I know, that the result isn't presented in driver app, that's why it takes long time, because collectAsList() collect all data from worker nodes.

But then what is the right way to get result values? Is there an other solution to iterate over a result dataset rows, or get values? Can anyone post a small & working example?

Thanks & Regards,
Laszlo Szep