StreamingKmeans Spark doesn't work at all

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

StreamingKmeans Spark doesn't work at all

Biplob Biswas
This post has NOT been accepted by the mailing list yet.


I implemented the streamingKmeans example provided in the spark website but in Java.
The full implementation is here,

But i am not getting anything in the output except occasional timestamps like one below:

Time: 1466176935000 ms

Also, i have 2 directories:
"D:\spark\streaming example\Data Sets\training"
"D:\spark\streaming example\Data Sets\test"

and inside these directories i have 1 file each "samplegpsdata_train.txt" and "samplegpsdata_test.txt" with training data having 500 datapoints and test data with 60 datapoints.

I am very new to the spark systems and any help is highly appreciated.


Now, I also have now tried using the scala implementation available here:

and even had the training and test file provided in the format specified in that file as follows:

 * The rows of the training text files must be vector data in the form
 * `[x1,x2,x3,...,xn]`
 * Where n is the number of dimensions.
 * The rows of the test text files must be labeled data in the form
 * `(y,[x1,x2,x3,...,xn])`
 * Where y is some identifier. n must be the same for train and test.

But I still get no output on my eclipse window ... just the Time!

Can anyone seriously help me with this?

Thank you so much
Biplob Biswas