MLLib sample data format

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

MLLib sample data format

Justin Yip-2
Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin
Reply | Threaded
Open this post in threaded view
|

Re: MLLib sample data format

coderxiang


On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip <[hidden email]> wrote:
Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: MLLib sample data format

Justin Yip-2
Hi Shuo,

Yes. I was reading the guide as well as the sample code.

For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, now where in the github repository I can find the file: sc.textFile("mllib/data/ridge-data/lpsa.data").

Thanks.

Justin



On Sun, Jun 22, 2014 at 2:40 PM, Shuo Xiang <[hidden email]> wrote:


On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip <[hidden email]> wrote:
Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin


Reply | Threaded
Open this post in threaded view
|

Re: MLLib sample data format

Justin Yip-2
Hi Shuo,

Yes. I was reading the guide as well as the sample code.

For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, nowhere in the github repository I can find the file: sc.textFile("mllib/data/ridge-data/lpsa.data").

Thanks.

Justin


On Sun, Jun 22, 2014 at 3:24 PM, Justin Yip <[hidden email]> wrote:
Hi Shuo,

Yes. I was reading the guide as well as the sample code.

For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, now where in the github repository I can find the file: sc.textFile("mllib/data/ridge-data/lpsa.data").

Thanks.

Justin



On Sun, Jun 22, 2014 at 2:40 PM, Shuo Xiang <[hidden email]> wrote:


On Sun, Jun 22, 2014 at 2:35 PM, Justin Yip <[hidden email]> wrote:
Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin



Reply | Threaded
Open this post in threaded view
|

Re: MLLib sample data format

Evan R. Sparks
In reply to this post by Justin Yip-2
These files follow the libsvm format where each line is a record, the first column is a label, and then after that the fields are offset:value where offset is the offset into the feature vector, and value is the value of the input feature. 

This is a fairly efficient representation for sparse but can double (or more) storage requirements for dense data. 

- Evan

On Jun 22, 2014, at 3:35 PM, Justin Yip <[hidden email]> wrote:

Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin
Reply | Threaded
Open this post in threaded view
|

Re: MLLib sample data format

Evan R. Sparks
In reply to this post by Justin Yip-2
Oh, and the movie lens one is userid::movieid::rating

- Evan

On Jun 22, 2014, at 3:35 PM, Justin Yip <[hidden email]> wrote:

Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin
Reply | Threaded
Open this post in threaded view
|

Re: MLLib sample data format

Justin Yip-2
I see. That's good. Thanks.

Justin


On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks <[hidden email]> wrote:
Oh, and the movie lens one is userid::movieid::rating

- Evan

On Jun 22, 2014, at 3:35 PM, Justin Yip <[hidden email]> wrote:

Hello,

I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented?

Thanks.

Justin