Re: Using MLLib in Scala

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Using MLLib in Scala

Xiangrui Meng
Hi Suela,

(Please subscribe our user mailing list and send your questions there
in the future.) For your case, each file contains a column of numbers.
So you can use `sc.textFile` to read them first, zip them together,
and then create labeled points:

val xx = sc.textFile("/path/to/ex2x.dat").map(x => Vectors.dense(_.toDouble))
val yy = sc.textFile("/path/to/ex2y.dat").map(_.toDouble)
val examples = yy.zip(xx).map { case (y, x) => LabeledPoint(y, x) }

Best,
Xiangrui

On Thu, May 29, 2014 at 2:35 AM, Suela Haxhi <[hidden email]> wrote:

>
> Hello Xiangrui ,
> my name is Suela Haxhi. Let me ask you a little help. I find some difficulty
> in uploading files in Mllib , namely:
> Binary Classification ;
> Linear Regression ;
> ........
>
> E.g. , the file " mllib / data / sample_svm_data.txt " contains the
> following data :
> 1 0 2.52078447201548 0 0 0 2.004684436494304 2.000347299268466 0
> 2.228387042742021 2.228387042742023 0 0 0 0 0 0
> 0 2.857738033247042 0 0 2.619965104088255 0 2.004684436494304
> 2.000347299268466 0 2.228387042742021 2.228387042742023 0 0 0 0 0 0
>
> etc .... ......
>
> I don't understand what are the input / output.
> The problem comes when I want to load another type of dataset. E.g. , I want
> to make a Binary Classification on the presence of a disease.
>
> For example, the estimated proffessor Andrew Ng, on courses in machine
> learning explains:
>
> Download ex2Data.zip, and extract the files from the zip file.The files
> Contain some example measurements of various heights for boys between the
> ages of two and eights. The y-values are the heights Measured in meters, and
> the x-values are the ages of the boys Corresponding to the heights. Each
> height and age tuples constitutes one training example $ (x ^ {(i)}, y ^
> {(i)} $ in our dataset. = There are $ m $ 50 training examples, and you will
> use them to develop a linear regression model .
> In this problem, you'll Implement linear regression using gradient descent.
> In Matlab / Octave, you can load the training set using the commands
> x = load ( ' ex2x.dat ' ) ;
> y = load ( ' ex2y.dat ' ) ;
>
>
>
> But,  in Mllib,  I can't figure out what these data mean (mllib / data /
> sample_svm_data.txt).
> And I don't know how to load another type of data set using the following
> code:
>
> Binary Classification
> import org.apache.spark.SparkContext
> import org.apache.spark.mllib.classification.SVMWithSGD
> import org.apache.spark.mllib.regression.LabeledPoint
>
> / / Load and parse the data file
>
> / / Run training algorithm to build the model
>
> / / Evaluate model on training examples and compute the training error
>
>
>
> Can you help me please? Thank you in advance.
>
> Best Regards
> Suela Haxhi