# mLIb solving linear regression with sparse inputs

8 messages
Open this post in threaded view
|

## mLIb solving linear regression with sparse inputs

 This post has NOT been accepted by the mailing list yet. I want to solve the linear regression problem using spark with huge martrices: Ax = b using least squares: x = Inverse(A-transpose) * A)*A-transpose *b The A matrix is a large sparse matrix (as is the b vector). I have pondered several solutions to the Ax = b problem including: 1) directly solving the problem above where the matrix is transposed, multiplied by itself, the inverse is taken and then multiplied by A-transpose and then multiplied by b which will give the solution vector x 2) iterative solver (no need to take the inverse) My question is:What is the best way to solve this problem using the MLib libraries, in JAVA and using RDD and spark? Is there any code as an example? Has anyone done this? The code to take in data represented as a coordinate matrix and perform transposition and multiplication is shown below but I need to take the inverse if I use this strategy: //Read coordinate matrix from text or database                 JavaRDD fileA = sc.textFile(file);                 //map text file with coordinate data (sparse matrix) to JavaRDD                JavaRDD matrixA = fileA.map(new Function() {                     public MatrixEntry call(String x){                         String[] indeceValue = x.split(",");                         long i = Long.parseLong(indeceValue[0]);                         long j = Long.parseLong(indeceValue[1]);                         double value = Double.parseDouble(indeceValue[2]);                         return new MatrixEntry(i, j, value );                     }                 });                                 //coordinate matrix from sparse data                 CoordinateMatrix cooMatrixA = new CoordinateMatrix(matrixA.rdd());                                 //create block matrix                 BlockMatrix matA = cooMatrixA.toBlockMatrix();                                 //create block matrix after matrix multiplication (square matrix)                 BlockMatrix ata = matA.transpose().multiply(matA);                                 //print out the original dense matrix                 System.out.println(matA.toLocalMatrix().toString());                                 //print out the transpose of the dense matrix                 System.out.println(matA.transpose().toLocalMatrix().toString());                                 //print out the square matrix (after multiplication)                 System.out.println(ata.toLocalMatrix().toString());                                 JavaRDD entries = ata.toCoordinateMatrix().entries().toJavaRDD();
Open this post in threaded view
|

## Re: mLIb solving linear regression with sparse inputs

 This post has NOT been accepted by the mailing list yet. Any reason why you can’t use built in linear regression e.g. http://spark.apache.org/docs/latest/ml-classification-regression.html#regression or http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression? -------------------------------------------------------------------------------Robin EastSpark GraphX in Action Michael Malak and Robin EastManning Publications Co.http://www.manning.com/books/spark-graphx-in-action On 3 Nov 2016, at 16:08, im281 [via Apache Spark User List] <[hidden email]> wrote: I want to solve the linear regression problem using spark with huge martrices: Ax = b using least squares: x = Inverse(A-transpose) * A)*A-transpose *b The A matrix is a large sparse matrix (as is the b vector). I have pondered several solutions to the Ax = b problem including: 1) directly solving the problem above where the matrix is transposed, multiplied by itself, the inverse is taken and then multiplied by A-transpose and then multiplied by b which will give the solution vector x 2) iterative solver (no need to take the inverse) My question is:What is the best way to solve this problem using the MLib libraries, in JAVA and using RDD and spark? Is there any code as an example? Has anyone done this? The code to take in data represented as a coordinate matrix and perform transposition and multiplication is shown below but I need to take the inverse if I use this strategy: //Read coordinate matrix from text or database                 JavaRDD fileA = sc.textFile(file);                 //map text file with coordinate data (sparse matrix) to JavaRDD                JavaRDD matrixA = fileA.map(new Function() {                     public MatrixEntry call(String x){                         String[] indeceValue = x.split(",");                         long i = Long.parseLong(indeceValue[0]);                         long j = Long.parseLong(indeceValue[1]);                         double value = Double.parseDouble(indeceValue[2]);                         return new MatrixEntry(i, j, value );                     }                 });                                 //coordinate matrix from sparse data                 CoordinateMatrix cooMatrixA = new CoordinateMatrix(matrixA.rdd());                                 //create block matrix                 BlockMatrix matA = cooMatrixA.toBlockMatrix();                                 //create block matrix after matrix multiplication (square matrix)                 BlockMatrix ata = matA.transpose().multiply(matA);                                 //print out the original dense matrix                 System.out.println(matA.toLocalMatrix().toString());                                 //print out the transpose of the dense matrix                 System.out.println(matA.transpose().toLocalMatrix().toString());                                 //print out the square matrix (after multiplication)                 System.out.println(ata.toLocalMatrix().toString());                                 JavaRDD entries = ata.toCoordinateMatrix().entries().toJavaRDD(); If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006.html To start a new topic under Apache Spark User List, email [hidden email] To unsubscribe from Apache Spark User List, click here. NAML Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action
Open this post in threaded view
|

## Re: mLIb solving linear regression with sparse inputs

 This post has NOT been accepted by the mailing list yet. I would like to use it. But how do I do the following 1) Read sparse data (from text or database) 2) pass the sparse data to the linearRegression class? For example: Sparse matrix A row, column, value 0,0,.42 0,1,.28 0,2,.89 1,0,.83 1,1,.34 1,2,.42 2,0,.23 3,0,.42 3,1,.98 3,2,.88 4,0,.23 4,1,.36 4,2,.97 Sparse vector b row, column, value 0,2,.89 1,2,.42 3,2,.88 4,2,.97 Solve Ax = b???
Open this post in threaded view
|

## Re: mLIb solving linear regression with sparse inputs

 This post has NOT been accepted by the mailing list yet. Here’s a way of creating sparse vectors in MLLib:import org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.rdd.RDDval rdd = sc.textFile("A.txt").map(line => line.split(",")).     map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))val create = (first: (Int, Int, Double)) => (Array(first._2), Array(first._3))val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int, Double)) => (head._1 :+ tail._2, head._2 :+ tail._3)val merge = (a: (Array[Int], Array[Double]), b: (Array[Int], Array[Double])) => (a._1 ++ b._1, a._2 ++ b._2)val A = pairRdd.combineByKey(create,combine,merge).map(el => Vectors.sparse(3,el._2._1,el._2._2))If you have a separate file of b’s then you would need to manipulate this slightly to join the b’s to the A RDD and then create LabeledPoints. I guess there is a way of doing this using the newer ML interfaces but it’s not particularly obvious to me how.One point: In the example you give the b’s are exactly the same as col 2 in the A matrix. I presume this is just a quick hacked together example because that would give a trivial result. -------------------------------------------------------------------------------Robin EastSpark GraphX in Action Michael Malak and Robin EastManning Publications Co.http://www.manning.com/books/spark-graphx-in-action On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] <[hidden email]> wrote: I would like to use it. But how do I do the following 1) Read sparse data (from text or database) 2) pass the sparse data to the linearRegression class? For example: Sparse matrix A row, column, value 0,0,.42 0,1,.28 0,2,.89 1,0,.83 1,1,.34 1,2,.42 2,0,.23 3,0,.42 3,1,.98 3,2,.88 4,0,.23 4,1,.36 4,2,.97 Sparse vector b row, column, value 0,2,.89 1,2,.42 3,2,.88 4,2,.97 Solve Ax = b??? If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html To start a new topic under Apache Spark User List, email [hidden email] To unsubscribe from Apache Spark User List, click here. NAML Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action
Open this post in threaded view
|

## Re: mLIb solving linear regression with sparse inputs

 This post has NOT been accepted by the mailing list yet. Thank you! Would happen to have this code in Java?. This is extremely helpful! Iman On Sun, Nov 6, 2016 at 3:35 AM -0800, "Robineast [via Apache Spark User List]" wrote: Here’s a way of creating sparse vectors in MLLib:import org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.rdd.RDDval rdd = sc.textFile("A.txt").map(line => line.split(",")).     map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))val create = (first: (Int, Int, Double)) => (Array(first._2), Array(first._3))val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int, Double)) => (head._1 :+ tail._2, head._2 :+ tail._3)val merge = (a: (Array[Int], Array[Double]), b: (Array[Int], Array[Double])) => (a._1 ++ b._1, a._2 ++ b._2)val A = pairRdd.combineByKey(create,combine,merge).map(el => Vectors.sparse(3,el._2._1,el._2._2))If you have a separate file of b’s then you would need to manipulate this slightly to join the b’s to the A RDD and then create LabeledPoints. I guess there is a way of doing this using the newer ML interfaces but it’s not particularly obvious to me how.One point: In the example you give the b’s are exactly the same as col 2 in the A matrix. I presume this is just a quick hacked together example because that would give a trivial result. -------------------------------------------------------------------------------Robin EastSpark GraphX in Action Michael Malak and Robin EastManning Publications Co.http://www.manning.com/books/spark-graphx-in-action On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] <[hidden email]> wrote: I would like to use it. But how do I do the following 1) Read sparse data (from text or database) 2) pass the sparse data to the linearRegression class? For example: Sparse matrix A row, column, value 0,0,.42 0,1,.28 0,2,.89 1,0,.83 1,1,.34 1,2,.42 2,0,.23 3,0,.42 3,1,.98 3,2,.88 4,0,.23 4,1,.36 4,2,.97 Sparse vector b row, column, value 0,2,.89 1,2,.42 3,2,.88 4,2,.97 Solve Ax = b??? If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html To start a new topic under Apache Spark User List, email [hidden email] To unsubscribe from Apache Spark User List, click here. NAML Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html To unsubscribe from mLIb solving linear regression with sparse inputs, click here. NAML
Open this post in threaded view
|

## Re: mLIb solving linear regression with sparse inputs

 This post has NOT been accepted by the mailing list yet. In reply to this post by Robineast Hi Robin,It looks like the linear regression model takes in a dataset not a matrix? It would be helpful for this example if you could set up the whole problem end to end using one of the columns of the matrix as b. So A is a sparse matrix and b is a sparse vectorBest regards.ImanOn Sun, Nov 6, 2016 at 6:43 AM <[hidden email]> wrote:Thank you! Would happen to have this code in Java?. This is extremely helpful! Iman On Sun, Nov 6, 2016 at 3:35 AM -0800, "Robineast [via Apache Spark User List]" wrote: Here’s a way of creating sparse vectors in MLLib:import org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.rdd.RDDval rdd = sc.textFile("A.txt").map(line => line.split(",")).     map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))val create = (first: (Int, Int, Double)) => (Array(first._2), Array(first._3))val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int, Double)) => (head._1 :+ tail._2, head._2 :+ tail._3)val merge = (a: (Array[Int], Array[Double]), b: (Array[Int], Array[Double])) => (a._1 ++ b._1, a._2 ++ b._2)val A = pairRdd.combineByKey(create,combine,merge).map(el => Vectors.sparse(3,el._2._1,el._2._2))If you have a separate file of b’s then you would need to manipulate this slightly to join the b’s to the A RDD and then create LabeledPoints. I guess there is a way of doing this using the newer ML interfaces but it’s not particularly obvious to me how.One point: In the example you give the b’s are exactly the same as col 2 in the A matrix. I presume this is just a quick hacked together example because that would give a trivial result. -------------------------------------------------------------------------------Robin EastSpark GraphX in Action Michael Malak and Robin EastManning Publications Co.http://www.manning.com/books/spark-graphx-in-action On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] <[hidden email]> wrote: I would like to use it. But how do I do the following 1) Read sparse data (from text or database) 2) pass the sparse data to the linearRegression class? For example: Sparse matrix A row, column, value 0,0,.42 0,1,.28 0,2,.89 1,0,.83 1,1,.34 1,2,.42 2,0,.23 3,0,.42 3,1,.98 3,2,.88 4,0,.23 4,1,.36 4,2,.97 Sparse vector b row, column, value 0,2,.89 1,2,.42 3,2,.88 4,2,.97 Solve Ax = b??? If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html To start a new topic under Apache Spark User List, email [hidden email] To unsubscribe from Apache Spark User List, click here. NAML Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html To unsubscribe from mLIb solving linear regression with sparse inputs, click here. NAML