Connected components using GraphFrames is significantly slower than GraphX?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Connected components using GraphFrames is significantly slower than GraphX?

kant kodali
Hi All,

Trying to understand why connected components algorithms runs much slower than the graphX equivalent?

Graphx code creates 16 stages.

GraphFrame graphFrame = GraphFrame.fromEdges(edges);
Dataset<Row> connectedComponents = graphFrame.connectedComponents().setAlgorithm("graphx").run();

and the GraphFrames code below creates 55 stages.

GraphFrame graphFrame = GraphFrame.fromEdges(edges);
Dataset<Row> connectedComponents = graphFrame.connectedComponents().run();
Any ideas on how to make GraphFrames faster? Also what is the latest Graph Processing Library/Framework I should be using? I feel like there isn't lot of work going on in either GraphFrames or GraphX so I am just curious on what I should use for long term?
Thanks!