Spark - Reduce operation taking too long

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spark - Reduce operation taking too long

parth2691
This post has NOT been accepted by the mailing list yet.
I have implemented kmeans using scala and Java over spark.
Both code have same algorithm and use same set of spark api methods.
But scala code is taking too long to execute.

Following is the link for data set
https://drive.google.com/open?id=0Bxnnu_Ig2Et9QjZoM3dmY1V5WXM 

Following are the code links
Scala Code
https://drive.google.com/open?id=0Bxnnu_Ig2Et9czhhcnlKUXdWaHNVcmVhOVRXQmczVjZtNWZv 

Java Code
https://drive.google.com/open?id=0Bxnnu_Ig2Et9cDBXNkVVZFNfd1pHblZBLXRuemk5UC0xVjJn 

Java Code Helper Function
https://drive.google.com/open?id=0Bxnnu_Ig2Et9OUhIMGh4YkFPRkFkeU05OVljc0RQZUo3SWpV 

Following are are my parameters for the algorithm K=2197,
Number of iterations=5
Data set contains 3 million 3 dimensional points

Java code take total :- 200 seconds to complete
Scala code take total:-1200 seconds to complete
Following Links talk about the similar kind of problem but none of the solutions mentioned in the link have helped.
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-td16413.html 
http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-td9407.html http://apache-spark-user-list.1001560.n3.nabble.com/extremely-slow-k-means-version-td4489.html 

I am unable to figure out why Scala code is taking too long as compared to Java Code


Thanks & Regards
Parth Khatwani
Loading...