-
- Downloads
SPARK-4156 [MLLIB] EM algorithm for GMMs
Implementation of Expectation-Maximization for Gaussian Mixture Models. This is my maiden contribution to Apache Spark, so I apologize now if I have done anything incorrectly; having said that, this work is my own, and I offer it to the project under the project's open source license. Author: Travis Galoppo <tjg2107@columbia.edu> Author: Travis Galoppo <travis@localhost.localdomain> Author: tgaloppo <tjg2107@columbia.edu> Author: FlytxtRnD <meethu.mathew@flytxt.com> Closes #3022 from tgaloppo/master and squashes the following commits: aaa8f25 [Travis Galoppo] MLUtils: changed privacy of EPSILON from [util] to [mllib] 709e4bf [Travis Galoppo] fixed usage line to include optional maxIterations parameter acf1fba [Travis Galoppo] Fixed parameter comment in GaussianMixtureModel Made maximum iterations an optional parameter to DenseGmmEM 9b2fc2a [Travis Galoppo] Style improvements Changed ExpectationSum to a private class b97fe00 [Travis Galoppo] Minor fixes and tweaks. 1de73f3 [Travis Galoppo] Removed redundant array from array creation 578c2d1 [Travis Galoppo] Removed unused import 227ad66 [Travis Galoppo] Moved prediction methods into model class. 308c8ad [Travis Galoppo] Numerous changes to improve code cff73e0 [Travis Galoppo] Replaced accumulators with RDD.aggregate 20ebca1 [Travis Galoppo] Removed unusued code 42b2142 [Travis Galoppo] Added functionality to allow setting of GMM starting point. Added two cluster test to testing suite. 8b633f3 [Travis Galoppo] Style issue 9be2534 [Travis Galoppo] Style issue d695034 [Travis Galoppo] Fixed style issues c3b8ce0 [Travis Galoppo] Merge branch 'master' of https://github.com/tgaloppo/spark Adds predict() method 2df336b [Travis Galoppo] Fixed style issue b99ecc4 [tgaloppo] Merge pull request #1 from FlytxtRnD/predictBranch f407b4c [FlytxtRnD] Added predict() to return the cluster labels and membership values 97044cf [Travis Galoppo] Fixed style issues dc9c742 [Travis Galoppo] Moved MultivariateGaussian utility class e7d413b [Travis Galoppo] Moved multivariate Gaussian utility class to mllib/stat/impl Improved comments 9770261 [Travis Galoppo] Corrected a variety of style and naming issues. 8aaa17d [Travis Galoppo] Added additional train() method to companion object for cluster count and tolerance parameters. 676e523 [Travis Galoppo] Fixed to no longer ignore delta value provided on command line e6ea805 [Travis Galoppo] Merged with master branch; update test suite with latest context changes. Improved cluster initialization strategy. 86fb382 [Travis Galoppo] Merge remote-tracking branch 'upstream/master' 719d8cc [Travis Galoppo] Added scala test suite with basic test c1a8e16 [Travis Galoppo] Made GaussianMixtureModel class serializable Modified sum function for better performance 5c96c57 [Travis Galoppo] Merge remote-tracking branch 'upstream/master' c15405c [Travis Galoppo] SPARK-4156
Showing
- examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala 67 additions, 0 deletions...in/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureEM.scala 241 additions, 0 deletions...org/apache/spark/mllib/clustering/GaussianMixtureEM.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala 91 additions, 0 deletions.../apache/spark/mllib/clustering/GaussianMixtureModel.scala
- mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala 39 additions, 0 deletions...g/apache/spark/mllib/stat/impl/MultivariateGaussian.scala
- mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 1 addition, 1 deletion.../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
- mllib/src/test/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximizationSuite.scala 78 additions, 0 deletions...rk/mllib/clustering/GMMExpectationMaximizationSuite.scala
Loading
Please register or sign in to comment