MLLIB-25: Implicit ALS runs out of memory for moderately large numbers of features

There's a step in implicit ALS where the matrix `Yt * Y` is computed. It's computed as the sum of matrices; an f x f matrix is created for each of n user/item rows in a partition. In `ALS.scala:214`: ``` factors.flatMapValues{ case factorArray => factorArray.map{ vector => val x = new DoubleMatrix(vector) x.mmul(x.transpose()) } }.reduceByKeyLocally((a, b) => a.addi(b)) .values .reduce((a, b) => a.addi(b)) ``` Completely correct, but there's a subtle but quite large memory problem here. map() is going to create all of these matrices in memory at once, when they don't need to ever all exist at the same time. For example, if a partition has n = 100000 rows, and f = 200, then this intermediate product requires 32GB of heap. The computation will never work unless you can cough up workers with (more than) that much heap. Fortunately there's a trivial change that fixes it; just add `.view` in there. Author: Sean Owen <sowen@cloudera.com> Closes #629 from srowen/ALSMatrixAllocationOptimization and squashes the following commits: 062cda9 [Sean Owen] Update style per review comments e9a5d63 [Sean Owen] Avoid unnecessary out of memory situation by not simultaneously allocating lots of matrices

MLLIB-25: Implicit ALS runs out of memory for moderately large numbers of features
c8a4c9b1 · Sean Owen · Reynold Xin · 45b15e27 · c8a4c9b1
Commit c8a4c9b1 authored 11 years ago by Sean Owen Committed by Reynold Xin 11 years ago
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -211,8 +211,8 @@ class ALS private (var numBlocks: Int, var rank: Int, var iterations: Int, var l
  def computeYtY(factors: RDD[(Int, Array[Array[Double]])]) = {
    if (implicitPrefs) {
      Option(
-        factors.flatMapValues{ case factorArray =>
-          factorArray.map{ vector =>
+        factors.flatMapValues { case factorArray =>
+          factorArray.view.map { vector =>
            val x = new DoubleMatrix(vector)
            x.mmul(x.transpose())
          }