This post has NOT been accepted by the mailing list yet.
after I created a dataset
Dataset<Row> df = sqlContext.sql("query");
I need to have a result values and I call a method: collectAsList()
List<Row> list = df.collectAsList();
But it's very slow, if I work with large datasets (20-30 million records). I know, that the result isn't presented in driver app, that's why it takes long time, because collectAsList() collect all data from worker nodes.
But then what is the right way to get result values? Is there an other solution to iterate over a result dataset rows, or get values? Can anyone post a small & working example?