Skip to content
Snippets Groups Projects
Commit c3a986b1 authored by Ignacio Bermudez's avatar Ignacio Bermudez Committed by Sean Owen
Browse files

[SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when converting from Breeze sparse matrix

## What changes were proposed in this pull request?

When two Breeze SparseMatrices are operated, the result matrix may contain provisional 0 values extra in rowIndices and data arrays. This causes an incoherence with the colPtrs data, but Breeze get away with this incoherence by keeping a counter of the valid data.

In spark, when this matrices are converted to SparseMatrices, Sparks relies solely on rowIndices, data, and colPtrs, but these might be incorrect because of breeze internal hacks. Therefore, we need to slice both rowIndices and data, using their counter of active data

This method is at least called by BlockMatrix when performing distributed block operations, causing exceptions on valid operations.

See http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add

## How was this patch tested?

Added a test to MatricesSuite that verifies that the conversions are valid and that code doesn't crash. Originally the same code would crash on Spark.

Bugfix for https://issues.apache.org/jira/browse/SPARK-20687



Author: Ignacio Bermudez <ignaciobermudez@gmail.com>
Author: Ignacio Bermudez Corrales <icorrales@splunk.com>

Closes #17940 from ghoto/bug-fix/SPARK-20687.

(cherry picked from commit 06dda1d5)
Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
parent e9804b3d
No related branches found
No related tags found
No related merge requests found
...@@ -992,7 +992,16 @@ object Matrices { ...@@ -992,7 +992,16 @@ object Matrices {
new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose) new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
case sm: BSM[Double] => case sm: BSM[Double] =>
// There is no isTranspose flag for sparse matrices in Breeze // There is no isTranspose flag for sparse matrices in Breeze
new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) val nsm = if (sm.rowIndices.length > sm.activeSize) {
// This sparse matrix has trailing zeros.
// Remove them by compacting the matrix.
val csm = sm.copy
csm.compact()
csm
} else {
sm
}
new SparseMatrix(nsm.rows, nsm.cols, nsm.colPtrs, nsm.rowIndices, nsm.data)
case _ => case _ =>
throw new UnsupportedOperationException( throw new UnsupportedOperationException(
s"Do not support conversion from type ${breeze.getClass.getName}.") s"Do not support conversion from type ${breeze.getClass.getName}.")
......
...@@ -513,6 +513,26 @@ class MatricesSuite extends SparkFunSuite { ...@@ -513,6 +513,26 @@ class MatricesSuite extends SparkFunSuite {
Matrices.fromBreeze(sum) Matrices.fromBreeze(sum)
} }
test("Test FromBreeze when Breeze.CSCMatrix.rowIndices has trailing zeros. - SPARK-20687") {
// (2, 0, 0)
// (2, 0, 0)
val mat1Brz = Matrices.sparse(2, 3, Array(0, 2, 2, 2), Array(0, 1), Array(2, 2)).asBreeze
// (2, 1E-15, 1E-15)
// (2, 1E-15, 1E-15)
val mat2Brz = Matrices.sparse(2, 3,
Array(0, 2, 4, 6),
Array(0, 0, 0, 1, 1, 1),
Array(2, 1E-15, 1E-15, 2, 1E-15, 1E-15)).asBreeze
val t1Brz = mat1Brz - mat2Brz
val t2Brz = mat2Brz - mat1Brz
// The following operations raise exceptions on un-patch Matrices.fromBreeze
val t1 = Matrices.fromBreeze(t1Brz)
val t2 = Matrices.fromBreeze(t2Brz)
// t1 == t1Brz && t2 == t2Brz
assert((t1.asBreeze - t1Brz).iterator.map((x) => math.abs(x._2)).sum < 1E-15)
assert((t2.asBreeze - t2Brz).iterator.map((x) => math.abs(x._2)).sum < 1E-15)
}
test("row/col iterator") { test("row/col iterator") {
val dm = new DenseMatrix(3, 2, Array(0, 1, 2, 3, 4, 0)) val dm = new DenseMatrix(3, 2, Array(0, 1, 2, 3, 4, 0))
val sm = dm.toSparse val sm = dm.toSparse
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment