From 2b36344f588d4e7357ce9921dc656e2389ba1dea Mon Sep 17 00:00:00 2001
From: Sean Owen <sowen@cloudera.com>
Date: Thu, 3 Jul 2014 11:54:51 -0700
Subject: [PATCH] SPARK-1675. Make clear whether computePrincipalComponents
 requires centered data

Just closing out this small JIRA, resolving with a comment change.

Author: Sean Owen <sowen@cloudera.com>

Closes #1171 from srowen/SPARK-1675 and squashes the following commits:

45ee9b7 [Sean Owen] Add simple note that data need not be centered for computePrincipalComponents
---
 .../org/apache/spark/mllib/linalg/distributed/RowMatrix.scala   | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
index 1a0073c9d4..695e03b736 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
@@ -347,6 +347,8 @@ class RowMatrix(
    * The principal components are stored a local matrix of size n-by-k.
    * Each column corresponds for one principal component,
    * and the columns are in descending order of component variance.
+   * The row data do not need to be "centered" first; it is not necessary for
+   * the mean of each column to be 0.
    *
    * @param k number of top principal components.
    * @return a matrix of size n-by-k, whose columns are principal components
-- 
GitLab