Skip to content
Snippets Groups Projects
Commit 30363ede authored by Reza Zadeh's avatar Reza Zadeh Committed by Xiangrui Meng
Browse files

[MLlib] [SPARK-6713] Iterators in columnSimilarities for mapPartitionsWithIndex

Use Iterators in columnSimilarities to allow mapPartitionsWithIndex to spill to disk. This could happen in a dense and large column - this way Spark can spill the pairs onto disk instead of building all the pairs before handing them to Spark.

Another PR coming to update documentation.

Author: Reza Zadeh <reza@databricks.com>

Closes #5364 from rezazadeh/optmemsim and squashes the following commits:

47c90ba [Reza Zadeh] Iterators in columnSimilarities for flatMap
parent 9fe41252
No related branches found
No related tags found
No related merge requests found
......@@ -531,7 +531,6 @@ class RowMatrix(
val rand = new XORShiftRandom(indx)
val scaled = new Array[Double](p.size)
iter.flatMap { row =>
val buf = new ListBuffer[((Int, Int), Double)]()
row match {
case SparseVector(size, indices, values) =>
val nnz = indices.size
......@@ -540,8 +539,9 @@ class RowMatrix(
scaled(k) = values(k) / q(indices(k))
k += 1
}
k = 0
while (k < nnz) {
Iterator.tabulate (nnz) { k =>
val buf = new ListBuffer[((Int, Int), Double)]()
val i = indices(k)
val iVal = scaled(k)
if (iVal != 0 && rand.nextDouble() < p(i)) {
......@@ -555,8 +555,8 @@ class RowMatrix(
l += 1
}
}
k += 1
}
buf
}.flatten
case DenseVector(values) =>
val n = values.size
var i = 0
......@@ -564,8 +564,8 @@ class RowMatrix(
scaled(i) = values(i) / q(i)
i += 1
}
i = 0
while (i < n) {
Iterator.tabulate (n) { i =>
val buf = new ListBuffer[((Int, Int), Double)]()
val iVal = scaled(i)
if (iVal != 0 && rand.nextDouble() < p(i)) {
var j = i + 1
......@@ -577,10 +577,9 @@ class RowMatrix(
j += 1
}
}
i += 1
}
buf
}.flatten
}
buf
}
}.reduceByKey(_ + _).map { case ((i, j), sim) =>
MatrixEntry(i.toLong, j.toLong, sim)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment