From 2b36344f588d4e7357ce9921dc656e2389ba1dea Mon Sep 17 00:00:00 2001 From: Sean Owen <sowen@cloudera.com> Date: Thu, 3 Jul 2014 11:54:51 -0700 Subject: [PATCH] SPARK-1675. Make clear whether computePrincipalComponents requires centered data Just closing out this small JIRA, resolving with a comment change. Author: Sean Owen <sowen@cloudera.com> Closes #1171 from srowen/SPARK-1675 and squashes the following commits: 45ee9b7 [Sean Owen] Add simple note that data need not be centered for computePrincipalComponents --- .../org/apache/spark/mllib/linalg/distributed/RowMatrix.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala index 1a0073c9d4..695e03b736 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala @@ -347,6 +347,8 @@ class RowMatrix( * The principal components are stored a local matrix of size n-by-k. * Each column corresponds for one principal component, * and the columns are in descending order of component variance. + * The row data do not need to be "centered" first; it is not necessary for + * the mean of each column to be 0. * * @param k number of top principal components. * @return a matrix of size n-by-k, whose columns are principal components -- GitLab