-
- Downloads
[SPARK-7368] [MLLIB] Add QR decomposition for RowMatrix
jira: https://issues.apache.org/jira/browse/SPARK-7368 Add QR decomposition for RowMatrix. I'm not sure what's the blueprint about the distributed Matrix from community and whether this will be a desirable feature , so I sent a prototype for discussion. I'll go on polish the code and provide ut and performance statistics if it's acceptable. The implementation refers to the [paper: https://www.cs.purdue.edu/homes/dgleich/publications/Benson%202013%20-%20direct-tsqr.pdf] Austin R. Benson, David F. Gleich, James Demmel. "Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures", 2013 IEEE International Conference on Big Data, which is a stable algorithm with good scalability. Currently I tried it on a 400000 * 500 rowMatrix (16 partitions) and it can bring down the computation time from 8.8 mins (using breeze.linalg.qr.reduced) to 2.6 mins on a 4 worker cluster. I think there will still be some room for performance improvement. Any trial and suggestion is welcome. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #5909 from hhbyyh/qrDecomposition and squashes the following commits: cec797b [Yuhao Yang] remove unnecessary qr 0fb1012 [Yuhao Yang] hierarchy R computing 3fbdb61 [Yuhao Yang] update qr to indirect and add ut 0d913d3 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into qrDecomposition 39213c3 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into qrDecomposition c0fc0c7 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into qrDecomposition 39b0b22 [Yuhao Yang] initial draft for discussion
Showing
- mllib/src/main/scala/org/apache/spark/mllib/linalg/SingularValueDecomposition.scala 8 additions, 0 deletions...pache/spark/mllib/linalg/SingularValueDecomposition.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 45 additions, 1 deletion...org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/RowMatrixSuite.scala 17 additions, 0 deletions...pache/spark/mllib/linalg/distributed/RowMatrixSuite.scala
Please register or sign in to comment