Skip to content
Snippets Groups Projects
Commit cf2e4165 authored by Xiangrui Meng's avatar Xiangrui Meng
Browse files

[SPARK-5958][MLLIB][DOC] update block matrix user guide

* Removed SVD code from examples.
* Corrected Java API doc link.
* Updated variable names: `AtransposeA` -> `ata`.
* Minor changes.

brkyvz

Author: Xiangrui Meng <meng@databricks.com>

Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the following commits:

70f53ac [Xiangrui Meng] update block matrix user guide
parent 1ed57086
No related branches found
No related tags found
No related merge requests found
...@@ -298,23 +298,22 @@ In general the use of non-deterministic RDDs can lead to errors. ...@@ -298,23 +298,22 @@ In general the use of non-deterministic RDDs can lead to errors.
### BlockMatrix ### BlockMatrix
A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where `MatrixBlock` is A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where a `MatrixBlock` is
a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is
the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`. the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
`BlockMatrix` supports methods such as `.add` and `.multiply` with another `BlockMatrix`. `BlockMatrix` supports methods such as `add` and `multiply` with another `BlockMatrix`.
`BlockMatrix` also has a helper function `.validate` which can be used to debug whether the `BlockMatrix` also has a helper function `validate` which can be used to check whether the
`BlockMatrix` is set up properly. `BlockMatrix` is set up properly.
<div class="codetabs"> <div class="codetabs">
<div data-lang="scala" markdown="1"> <div data-lang="scala" markdown="1">
A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using `.toBlockMatrix()`. most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`.
`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks `toBlockMatrix` creates blocks of size 1024 x 1024 by default.
by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`. Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
{% highlight scala %} {% highlight scala %}
import org.apache.spark.mllib.linalg.SingularValueDecomposition
import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry} import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry}
val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
...@@ -323,29 +322,24 @@ val coordMat: CoordinateMatrix = new CoordinateMatrix(entries) ...@@ -323,29 +322,24 @@ val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
// Transform the CoordinateMatrix to a BlockMatrix // Transform the CoordinateMatrix to a BlockMatrix
val matA: BlockMatrix = coordMat.toBlockMatrix().cache() val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
// validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. // Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
// Nothing happens if it is valid. // Nothing happens if it is valid.
matA.validate matA.validate()
// Calculate A^T A. // Calculate A^T A.
val AtransposeA = matA.transpose.multiply(matA) val ata = matA.transpose.multiply(matA)
// get SVD of 2 * A
val A2 = matA.add(matA)
val svd = A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9)
{% endhighlight %} {% endhighlight %}
</div> </div>
<div data-lang="java" markdown="1"> <div data-lang="java" markdown="1">
A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be A [`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) can be
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` using `.toBlockMatrix()`. most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`.
`.toBlockMatrix()` will create blocks of size 1024 x 1024. Users may change the sizes of their blocks `toBlockMatrix` creates blocks of size 1024 x 1024 by default.
by supplying the values through `.toBlockMatrix(rowsPerBlock, colsPerBlock)`. Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
{% highlight java %} {% highlight java %}
import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.SingularValueDecomposition;
import org.apache.spark.mllib.linalg.distributed.BlockMatrix; import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix; import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix; import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
...@@ -356,17 +350,12 @@ CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd()); ...@@ -356,17 +350,12 @@ CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
// Transform the CoordinateMatrix to a BlockMatrix // Transform the CoordinateMatrix to a BlockMatrix
BlockMatrix matA = coordMat.toBlockMatrix().cache(); BlockMatrix matA = coordMat.toBlockMatrix().cache();
// validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid. // Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
// Nothing happens if it is valid. // Nothing happens if it is valid.
matA.validate(); matA.validate();
// Calculate A^T A. // Calculate A^T A.
BlockMatrix AtransposeA = matA.transpose().multiply(matA); BlockMatrix ata = matA.transpose().multiply(matA);
// get SVD of 2 * A
BlockMatrix A2 = matA.add(matA);
SingularValueDecomposition<IndexedRowMatrix, Matrix> svd =
A2.toIndexedRowMatrix().computeSVD(20, false, 1e-9);
{% endhighlight %} {% endhighlight %}
</div> </div>
</div> </div>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment