Spark - Reduce operation taking too long

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark - Reduce operation taking too long

This post has NOT been accepted by the mailing list yet.
I have implemented kmeans using scala and Java over spark.
Both code have same algorithm and use same set of spark api methods.
But scala code is taking too long to execute.

Following is the link for data set 

Following are the code links
Scala Code 

Java Code 

Java Code Helper Function 

Following are are my parameters for the algorithm K=2197,
Number of iterations=5
Data set contains 3 million 3 dimensional points

Java code take total :- 200 seconds to complete
Scala code take total:-1200 seconds to complete
Following Links talk about the similar kind of problem but none of the solutions mentioned in the link have helped. 

I am unable to figure out why Scala code is taking too long as compared to Java Code

Thanks & Regards
Parth Khatwani