Hi Suela,

(Please subscribe our user mailing list and send your questions there

in the future.) For your case, each file contains a column of numbers.

So you can use `sc.textFile` to read them first, zip them together,

and then create labeled points:

val xx = sc.textFile("/path/to/ex2x.dat").map(x => Vectors.dense(_.toDouble))

val yy = sc.textFile("/path/to/ex2y.dat").map(_.toDouble)

val examples = yy.zip(xx).map { case (y, x) => LabeledPoint(y, x) }

Best,

Xiangrui

On Thu, May 29, 2014 at 2:35 AM, Suela Haxhi <

[hidden email]> wrote:

>

> Hello Xiangrui ,

> my name is Suela Haxhi. Let me ask you a little help. I find some difficulty

> in uploading files in Mllib , namely:

> Binary Classification ;

> Linear Regression ;

> ........

>

> E.g. , the file " mllib / data / sample_svm_data.txt " contains the

> following data :

> 1 0 2.52078447201548 0 0 0 2.004684436494304 2.000347299268466 0

> 2.228387042742021 2.228387042742023 0 0 0 0 0 0

> 0 2.857738033247042 0 0 2.619965104088255 0 2.004684436494304

> 2.000347299268466 0 2.228387042742021 2.228387042742023 0 0 0 0 0 0

>

> etc .... ......

>

> I don't understand what are the input / output.

> The problem comes when I want to load another type of dataset. E.g. , I want

> to make a Binary Classification on the presence of a disease.

>

> For example, the estimated proffessor Andrew Ng, on courses in machine

> learning explains:

>

> Download ex2Data.zip, and extract the files from the zip file.The files

> Contain some example measurements of various heights for boys between the

> ages of two and eights. The y-values are the heights Measured in meters, and

> the x-values are the ages of the boys Corresponding to the heights. Each

> height and age tuples constitutes one training example $ (x ^ {(i)}, y ^

> {(i)} $ in our dataset. = There are $ m $ 50 training examples, and you will

> use them to develop a linear regression model .

> In this problem, you'll Implement linear regression using gradient descent.

> In Matlab / Octave, you can load the training set using the commands

> x = load ( ' ex2x.dat ' ) ;

> y = load ( ' ex2y.dat ' ) ;

>

>

>

> But, in Mllib, I can't figure out what these data mean (mllib / data /

> sample_svm_data.txt).

> And I don't know how to load another type of data set using the following

> code:

>

> Binary Classification

> import org.apache.spark.SparkContext

> import org.apache.spark.mllib.classification.SVMWithSGD

> import org.apache.spark.mllib.regression.LabeledPoint

>

> / / Load and parse the data file

>

> / / Run training algorithm to build the model

>

> / / Evaluate model on training examples and compute the training error

>

>

>

> Can you help me please? Thank you in advance.

>

> Best Regards

> Suela Haxhi