StreamingKmeans Spark doesn't work at all

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

StreamingKmeans Spark doesn't work at all

Biplob Biswas
This post has NOT been accepted by the mailing list yet.

Hi,

I implemented the streamingKmeans example provided in the spark website but in Java.
The full implementation is here,

http://pastebin.com/CJQfWNvk

But i am not getting anything in the output except occasional timestamps like one below:

-------------------------------------------
Time: 1466176935000 ms
-------------------------------------------

Also, i have 2 directories:
"D:\spark\streaming example\Data Sets\training"
"D:\spark\streaming example\Data Sets\test"

and inside these directories i have 1 file each "samplegpsdata_train.txt" and "samplegpsdata_test.txt" with training data having 500 datapoints and test data with 60 datapoints.

I am very new to the spark systems and any help is highly appreciated.

//---------------------------------------------------------------------------------------//

Now, I also have now tried using the scala implementation available here:
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala


and even had the training and test file provided in the format specified in that file as follows:

 * The rows of the training text files must be vector data in the form
 * `[x1,x2,x3,...,xn]`
 * Where n is the number of dimensions.
 *
 * The rows of the test text files must be labeled data in the form
 * `(y,[x1,x2,x3,...,xn])`
 * Where y is some identifier. n must be the same for train and test.


But I still get no output on my eclipse window ... just the Time!

Can anyone seriously help me with this?

Thank you so much
Biplob Biswas