Spark - Reduce operation taking too long

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Spark - Reduce operation taking too long

This post has NOT been accepted by the mailing list yet.
I have implemented kmeans using scala and Java over spark.
Both code have same algorithm and use same set of spark api methods.
But scala code is taking too long to execute.

Following is the link for data set

Following are the code links
Scala Code

Java Code

Java Code Helper Function

Following are are my parameters for the algorithm K=2197,
Number of iterations=5
Data set contains 3 million 3 dimensional points

Java code take total :- 200 seconds to complete
Scala code take total:-1200 seconds to complete
Following Links talk about the similar kind of problem but none of the solutions mentioned in the link have helped.
http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-td9407.html http://apache-spark-user-list.1001560.n3.nabble.com/extremely-slow-k-means-version-td4489.html 

I am unable to figure out why Scala code is taking too long as compared to Java Code

Thanks & Regards
Parth Khatwani