-
- Downloads
[SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local
## What changes were proposed in this pull request? Before, spark.ml GaussianMixtureModel used the spark.mllib MultivariateGaussian in its public API. This was added after 1.6, so we can modify this API without breaking APIs. This PR copies MultivariateGaussian to mllib-local in spark.ml, with a few changes: * Renamed fields to match numpy, scipy: mu => mean, sigma => cov This PR then uses the spark.ml MultivariateGaussian in the spark.ml GaussianMixtureModel, which involves: * Modifying the constructor * Adding a computeProbabilities method Also: * Added EPSILON to mllib-local for use in MultivariateGaussian ## How was this patch tested? Existing unit tests Author: Joseph K. Bradley <joseph@databricks.com> Closes #12593 from jkbradley/sparkml-gmm-fix.
Showing
- mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala 30 additions, 0 deletions...local/src/main/scala/org/apache/spark/ml/impl/Utils.scala
- mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala 131 additions, 0 deletions...che/spark/ml/stat/distribution/MultivariateGaussian.scala
- mllib-local/src/test/scala/org/apache/spark/ml/impl/UtilsSuite.scala 30 additions, 0 deletions.../src/test/scala/org/apache/spark/ml/impl/UtilsSuite.scala
- mllib-local/src/test/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussianSuite.scala 83 additions, 0 deletions...park/ml/stat/distribution/MultivariateGaussianSuite.scala
- mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala 73 additions, 35 deletions...cala/org/apache/spark/ml/clustering/GaussianMixture.scala
- mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala 2 additions, 2 deletions...org/apache/spark/ml/clustering/GaussianMixtureSuite.scala
- python/pyspark/ml/clustering.py 4 additions, 7 deletionspython/pyspark/ml/clustering.py
Loading
Please register or sign in to comment