How to process multiple classification with SVM in MLlib

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to process multiple classification with SVM in MLlib

Cui xp
Hi All,
  As we know, In MLlib the SVM is used for binary classification. I wonder how to train SVM model for mutiple classification in MLlib. In addition, how to apply the machine learning algorithm in Spark if the algorithm isn't included in MLlib. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: How to process multiple classification with SVM in MLlib

Xiangrui Meng
At this time, you need to do one-vs-all manually for multiclass
training. For your second question, if the algorithm is implemented in
Java/Scala/Python and designed for single machine, you can broadcast
the dataset to each worker, train models on workers. If the algorithm
is implemented in a different language, maybe you need pipe to train
the models outside JVM (similar to Hadoop Streaming). If the algorithm
is designed for a different parallel platform, then it may be hard to
use it in Spark. -Xiangrui

On Sat, Jun 7, 2014 at 7:15 AM, littlebird <[hidden email]> wrote:

> Hi All,
>   As we know, In MLlib the SVM is used for binary classification. I wonder
> how to train SVM model for mutiple classification in MLlib. In addition, how
> to apply the machine learning algorithm in Spark if the algorithm isn't
> included in MLlib. Thank you.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: How to process multiple classification with SVM in MLlib

Cui xp
Thank you for your reply, I don't quite understand how to do one-vs-all manually for multiclass
training. And for the second question, My algorithm is implemented in Java and designed for single machine, How can I broadcast the dataset to each worker, train models on workers? Thank you very much.
Reply | Threaded
Open this post in threaded view
|

Re: How to process multiple classification with SVM in MLlib

Xiangrui Meng
For broadcast data, please read
http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
.
For one-vs-all, please read
https://en.wikipedia.org/wiki/Multiclass_classification .

-Xiangrui

On Mon, Jun 9, 2014 at 7:24 AM, littlebird <[hidden email]> wrote:

> Thank you for your reply, I don't quite understand how to do one-vs-all
> manually for multiclass
> training. And for the second question, My algorithm is implemented in Java
> and designed for single machine, How can I broadcast the dataset to each
> worker, train models on workers? Thank you very much.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-process-multiple-classification-with-SVM-in-MLlib-tp7174p7251.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: How to process multiple classification with SVM in MLlib

Cui xp
This post was updated on .
Thanks. Now I know how to broadcast the dataset but I still wonder after  broadcasting the dataset how can I apply my algorithm to training the model in the wokers. To describe my question in detail, The following code is used to train LDA(Latent Dirichlet Allocation) model with JGibbLDA in single machine, it iterate to sample the topic and train the model. After  broadcasting the dataset, how can I keep the code  running in Spark? Thank you.
                LDACmdOption ldaOption = new LDACmdOption(); //to set the parameters of LDA
                ldaOption.est = true;
                ldaOption.estc = false;
                ldaOption.modelName = "model-final";//the name of the output file
                ldaOption.dir = "/usr/Java";
                ldaOption.dfile = "newDoc.dat"//this is the input data file
                ldaOption.alpha = 0.5;
                ldaOption.beta = 0.1;
                ldaOption.K = 10;// the numbers of the topic
                ldaOption.niters = 1000;//the times of iteration
                topicNum = ldaOption.K;
                Estimator estimator = new Estimator();
                estimator.init(ldaOption);
                estimator.estimate();

Reply | Threaded
Open this post in threaded view
|

Re: How to process multiple classification with SVM in MLlib

Cui xp
In reply to this post by Xiangrui Meng
Someone suggests me to use Mahout, but I'm not familiar with it. And in that case, using Mahout will add difficulties to my program. I'd like to run the algorithm in Spark. I'm a beginner, can you give me some suggestions?