Quantcast

Spark scala kmeans takes too long for large training data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spark scala kmeans takes too long for large training data

parth2691
This post has NOT been accepted by the mailing list yet.
I have implemented kmeans using scala and Java over spark. Both code have same algorithm and use same set of spark api methods. But scala code is taking too long to execute Following is the link for data set https://drive.google.com/open?id=0Bxnnu_Ig2Et9QjZoM3dmY1V5WXM Following are the code links Scala Code https://drive.google.com/open?id=0Bxnnu_Ig2Et9czhhcnlKUXdWaHNVcmVhOVRXQmczVjZtNWZv Java Code https://drive.google.com/open?id=0Bxnnu_Ig2Et9cDBXNkVVZFNfd1pHblZBLXRuemk5UC0xVjJn Java Code Helper Function https://drive.google.com/open?id=0Bxnnu_Ig2Et9OUhIMGh4YkFPRkFkeU05OVljc0RQZUo3SWpV Kmeans.scala is scala code Kmeans.java is the java code and JavaHelper contains the helping functions Following are are my parameters for the algorithm K=2197,Number of iterations=5 Data set contains 3 million 3 dimensional points Java code take total :- 200 seconds to complete Scala code take total:-1200 seconds to complete Following Links talk about the similar kind of problem but none of the solutions mentioned in the link have helped. http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-td16413.html http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-td9407.html http://apache-spark-user-list.1001560.n3.nabble.com/extremely-slow-k-means-version-td4489.html I am unable to figure out why Scala code is taking too long as compared to Java Code Thanks & Regards Parth Khatwani
Loading...