-
- Downloads
[SPARK-1390] Refactoring of matrices backed by RDDs
This is to refactor interfaces for matrices backed by RDDs. It would be better if we have a clear separation of local matrices and those backed by RDDs. Right now, we have 1. `org.apache.spark.mllib.linalg.SparseMatrix`, which is a wrapper over an RDD of matrix entries, i.e., coordinate list format. 2. `org.apache.spark.mllib.linalg.TallSkinnyDenseMatrix`, which is a wrapper over RDD[Array[Double]], i.e. row-oriented format. We will see naming collision when we introduce local `SparseMatrix`, and the name `TallSkinnyDenseMatrix` is not exact if we switch to `RDD[Vector]` from `RDD[Array[Double]]`. It would be better to have "RDD" in the class name to suggest that operations may trigger jobs. The proposed names are (all under `org.apache.spark.mllib.linalg.rdd`): 1. `RDDMatrix`: trait for matrices backed by one or more RDDs 2. `CoordinateRDDMatrix`: wrapper of `RDD[(Long, Long, Double)]` 3. `RowRDDMatrix`: wrapper of `RDD[Vector]` whose rows do not have special ordering 4. `IndexedRowRDDMatrix`: wrapper of `RDD[(Long, Vector)]` whose rows are associated with indices The current code also introduces local matrices. Author: Xiangrui Meng <meng@databricks.com> Closes #296 from mengxr/mat and squashes the following commits: 24d8294 [Xiangrui Meng] fix for groupBy returning Iterable bfc2b26 [Xiangrui Meng] merge master 8e4f1f5 [Xiangrui Meng] Merge branch 'master' into mat 0135193 [Xiangrui Meng] address Reza's comments 03cd7e1 [Xiangrui Meng] add pca/gram to IndexedRowMatrix add toBreeze to DistributedMatrix for test simplify tests b177ff1 [Xiangrui Meng] address Matei's comments be119fe [Xiangrui Meng] rename m/n to numRows/numCols for local matrix add tests for matrices b881506 [Xiangrui Meng] rename SparkPCA/SVD to TallSkinnyPCA/SVD e7d0d4a [Xiangrui Meng] move IndexedRDDMatrixRow to IndexedRowRDDMatrix 0d1491c [Xiangrui Meng] fix test errors a85262a [Xiangrui Meng] rename RDDMatrixRow to IndexedRDDMatrixRow b8b6ac3 [Xiangrui Meng] Remove old code 4cf679c [Xiangrui Meng] port pca to RowRDDMatrix, and add multiply and covariance 7836e2f [Xiangrui Meng] initial refactoring of matrices backed by RDDs
Showing
- examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnyPCA.scala 64 additions, 0 deletions...scala/org/apache/spark/examples/mllib/TallSkinnyPCA.scala
- examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala 64 additions, 0 deletions...scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 101 additions, 0 deletions...c/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/MatrixSVD.scala 0 additions, 29 deletions.../main/scala/org/apache/spark/mllib/linalg/MatrixSVD.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala 0 additions, 120 deletionsmllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala 0 additions, 395 deletionsmllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/SingularValueDecomposition.scala 2 additions, 7 deletions...pache/spark/mllib/linalg/SingularValueDecomposition.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/TallSkinnyMatrixSVD.scala 0 additions, 31 deletions...a/org/apache/spark/mllib/linalg/TallSkinnyMatrixSVD.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/CoordinateMatrix.scala 112 additions, 0 deletions...che/spark/mllib/linalg/distributed/CoordinateMatrix.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/DistributedMatrix.scala 15 additions, 8 deletions...he/spark/mllib/linalg/distributed/DistributedMatrix.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala 148 additions, 0 deletions...che/spark/mllib/linalg/distributed/IndexedRowMatrix.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 344 additions, 0 deletions...org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
- mllib/src/main/scala/org/apache/spark/mllib/util/LAUtils.scala 0 additions, 67 deletions.../src/main/scala/org/apache/spark/mllib/util/LAUtils.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala 21 additions, 8 deletions...ache/spark/mllib/linalg/BreezeMatrixConversionSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala 18 additions, 9 deletions...t/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/PCASuite.scala 0 additions, 124 deletions...c/test/scala/org/apache/spark/mllib/linalg/PCASuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/SVDSuite.scala 0 additions, 194 deletions...c/test/scala/org/apache/spark/mllib/linalg/SVDSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/CoordinateMatrixSuite.scala 98 additions, 0 deletions...park/mllib/linalg/distributed/CoordinateMatrixSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala 120 additions, 0 deletions...park/mllib/linalg/distributed/IndexedRowMatrixSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala 173 additions, 0 deletions...pache/spark/mllib/linalg/distributed/RowMatrixSuite.scala
Loading
Please register or sign in to comment