Skip to content
Snippets Groups Projects
  1. May 09, 2016
    • dding3's avatar
      [SPARK-15172][ML] Explicitly tell user initial coefficients is ignored when... · a78fbfa6
      dding3 authored
      [SPARK-15172][ML] Explicitly tell user initial coefficients is ignored when size mismatch happened in LogisticRegression
      
      ## What changes were proposed in this pull request?
      Explicitly tell user initial coefficients is ignored if its size doesn't match expected size in LogisticRegression
      
      ## How was this patch tested?
      local build
      
      Author: dding3 <dingding@dingding-ubuntu.sh.intel.com>
      
      Closes #12948 from dding3/master.
      a78fbfa6
    • Yuhao Yang's avatar
      [SPARK-14814][MLLIB] API: Java compatibility, docs · 68abc1b4
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      jira: https://issues.apache.org/jira/browse/SPARK-14814
      fix a java compatibility function in mllib DecisionTreeModel. As synced in jira, other compatibility issues don't need fixes.
      
      ## How was this patch tested?
      
      existing ut
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #12971 from hhbyyh/javacompatibility.
      68abc1b4
    • Liang-Chi Hsieh's avatar
      [SPARK-15211][SQL] Select features column from LibSVMRelation causes failure · 635ef407
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      We need to use `requiredSchema` in `LibSVMRelation` to project the fetch required columns when loading data from this data source. Otherwise, when users try to select `features` column, it will cause failure.
      
      ## How was this patch tested?
      `LibSVMRelationSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #12986 from viirya/fix-libsvmrelation.
      635ef407
  2. May 06, 2016
    • Burak Köse's avatar
      [SPARK-14050][ML] Add multiple languages support and additional methods for Stop Words Remover · e20cd9f4
      Burak Köse authored
      ## What changes were proposed in this pull request?
      
      This PR continues the work from #11871 with the following changes:
      * load English stopwords as default
      * covert stopwords to list in Python
      * update some tests and doc
      
      ## How was this patch tested?
      
      Unit tests.
      
      Closes #11871
      
      cc: burakkose srowen
      
      Author: Burak Köse <burakks41@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Burak KOSE <burakks41@gmail.com>
      
      Closes #12843 from mengxr/SPARK-14050.
      e20cd9f4
  3. May 05, 2016
    • Andrew Or's avatar
      [HOTFIX] Fix MLUtils compile · 7f5922aa
      Andrew Or authored
      7f5922aa
    • Jacek Laskowski's avatar
      [SPARK-15152][DOC][MINOR] Scaladoc and Code style Improvements · bbb77734
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Minor doc and code style fixes
      
      ## How was this patch tested?
      
      local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #12928 from jaceklaskowski/SPARK-15152.
      bbb77734
    • Holden Karau's avatar
      [SPARK-15106][PYSPARK][ML] Add PySpark package doc for ML component & remove "BETA" · 4c0d827c
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Copy the package documentation from Scala/Java to Python for ML package and remove beta tags. Not super sure if we want to keep the BETA tag but since we are making it the default it seems like probably the time to remove it (happy to put it back in if we want to keep it BETA).
      
      ## How was this patch tested?
      
      Python documentation built locally as HTML and text and verified output.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12883 from holdenk/SPARK-15106-add-pyspark-package-doc-for-ml.
      4c0d827c
  4. May 04, 2016
    • Dominik Jastrzębski's avatar
      [SPARK-14844][ML] Add setFeaturesCol and setPredictionCol to KMeansM… · abecbcd5
      Dominik Jastrzębski authored
      ## What changes were proposed in this pull request?
      
      Introduction of setFeaturesCol and setPredictionCol methods to KMeansModel in ML library.
      
      ## How was this patch tested?
      
      By running KMeansSuite.
      
      Author: Dominik Jastrzębski <dominik.jastrzebski@codilime.com>
      
      Closes #12609 from dominik-jastrzebski/master.
      abecbcd5
    • Cheng Lian's avatar
      [SPARK-14237][SQL] De-duplicate partition value appending logic in various... · bc3760d4
      Cheng Lian authored
      [SPARK-14237][SQL] De-duplicate partition value appending logic in various buildReader() implementations
      
      ## What changes were proposed in this pull request?
      
      Currently, various `FileFormat` data sources share approximately the same code for partition value appending. This PR tries to eliminate this duplication.
      
      A new method `buildReaderWithPartitionValues()` is added to `FileFormat` with a default implementation that appends partition values to `InternalRow`s produced by the reader function returned by `buildReader()`.
      
      Special data sources like Parquet, which implements partition value appending inside `buildReader()` because of the vectorized reader, and the Text data source, which doesn't support partitioning, override `buildReaderWithPartitionValues()` and simply delegate to `buildReader()`.
      
      This PR brings two benefits:
      
      1. Apparently, it de-duplicates partition value appending logic
      
      2. Now the reader function returned by `buildReader()` is only required to produce `InternalRow`s rather than `UnsafeRow`s if the data source doesn't override `buildReaderWithPartitionValues()`.
      
         Because the safe-to-unsafe conversion is also performed while appending partition values. This makes 3rd-party data sources (e.g. spark-avro) easier to implement since they no longer need to access private APIs involving `UnsafeRow`.
      
      ## How was this patch tested?
      
      Existing tests should do the work.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #12866 from liancheng/spark-14237-simplify-partition-values-appending.
      bc3760d4
  5. May 03, 2016
  6. May 01, 2016
    • Xusen Yin's avatar
      [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update · a6428292
      Xusen Yin authored
      ## What changes were proposed in this pull request?
      
      This PR is an update for [https://github.com/apache/spark/pull/12738] which:
      * Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side
      * Various fixes for bugs found
        * This includes changing classes taking weightCol to treat unset and empty String Param values the same way.
      
      Defaults changed:
      * Scala
       * LogisticRegression: weightCol defaults to not set (instead of empty string)
       * StringIndexer: labels default to not set (instead of empty array)
       * GeneralizedLinearRegression:
         * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver)
         * weightCol defaults to not set (instead of empty string)
       * LinearRegression: weightCol defaults to not set (instead of empty string)
      * Python
       * MultilayerPerceptron: layers default to not set (instead of [1,1])
       * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set)
      
      ## How was this patch tested?
      
      Generic unit test.  Manually tested that unit test by changing defaults and verifying that broke the test.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      Author: yinxusen <yinxusen@gmail.com>
      
      Closes #12816 from jkbradley/yinxusen-SPARK-14931.
      a6428292
  7. Apr 30, 2016
    • Yanbo Liang's avatar
      [SPARK-15030][ML][SPARKR] Support formula in spark.kmeans in SparkR · 19a6d192
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * ```RFormula``` supports empty response variable like ```~ x + y```.
      * Support formula in ```spark.kmeans``` in SparkR.
      * Fix some outdated docs for SparkR.
      
      ## How was this patch tested?
      Unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #12813 from yanboliang/spark-15030.
      19a6d192
    • Herman van Hovell's avatar
      [SPARK-14952][CORE][ML] Remove methods that were deprecated in 1.6.0 · e5fb78ba
      Herman van Hovell authored
      #### What changes were proposed in this pull request?
      
      This PR removes three methods the were deprecated in 1.6.0:
      - `PortableDataStream.close()`
      - `LinearRegression.weights`
      - `LogisticRegression.weights`
      
      The rationale for doing this is that the impact is small and that Spark 2.0 is a major release.
      
      #### How was this patch tested?
      Compilation succeded.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #12732 from hvanhovell/SPARK-14952.
      e5fb78ba
    • Xiangrui Meng's avatar
      [SPARK-14653][ML] Remove json4s from mllib-local · 0847fe4e
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      This PR moves Vector.toJson/fromJson to ml.linalg.VectorEncoder under mllib/ to keep mllib-local's dependency minimal. The json encoding is used by Params. So we still need this feature in SPARK-14615, where we will switch to ml.linalg in spark.ml APIs.
      
      ## How was this patch tested?
      
      Copied existing unit tests over.
      
      cc; dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12802 from mengxr/SPARK-14653.
      0847fe4e
    • Junyang's avatar
      [SPARK-13289][MLLIB] Fix infinite distances between word vectors in Word2VecModel · 1192fe4c
      Junyang authored
      ## What changes were proposed in this pull request?
      
      This PR fixes the bug that generates infinite distances between word vectors. For example,
      
      Before this PR, we have
      ```
      val synonyms = model.findSynonyms("who", 40)
      ```
      will give the following results:
      ```
      to Infinity
      and Infinity
      that Infinity
      with Infinity
      ```
      With this PR, the distance between words is a value between 0 and 1, as follows:
      ```
      scala> model.findSynonyms("who", 10)
      res0: Array[(String, Double)] = Array((Harvard-educated,0.5253688097000122), (ex-SAS,0.5213794708251953), (McMutrie,0.5187736749649048), (fellow,0.5166833400726318), (businessman,0.5145374536514282), (American-born,0.5127736330032349), (British-born,0.5062344074249268), (gray-bearded,0.5047978162765503), (American-educated,0.5035858750343323), (mentored,0.49849334359169006))
      
      scala> model.findSynonyms("king", 10)
      res1: Array[(String, Double)] = Array((queen,0.6787897944450378), (prince,0.6786158084869385), (monarch,0.659771203994751), (emperor,0.6490438580513), (goddess,0.643266499042511), (dynasty,0.635733425617218), (sultan,0.6166239380836487), (pharaoh,0.6150713562965393), (birthplace,0.6143025159835815), (empress,0.6109727025032043))
      
      scala> model.findSynonyms("queen", 10)
      res2: Array[(String, Double)] = Array((princess,0.7670737504959106), (godmother,0.6982434988021851), (raven-haired,0.6877717971801758), (swan,0.684934139251709), (hunky,0.6816608309745789), (Titania,0.6808111071586609), (heroine,0.6794036030769348), (king,0.6787897944450378), (diva,0.67848801612854), (lip-synching,0.6731793284416199))
      ```
      
      ### There are two places changed in this PR:
      - Normalize the word vector to avoid overflow when calculating inner product between word vectors. This also simplifies the distance calculation, since the word vectors only need to be normalized once.
      - Scale the learning rate by number of iteration, to be consistent with Google Word2Vec implementation
      
      ## How was this patch tested?
      
      Use word2vec to train text corpus, and run model.findSynonyms() to get the distances between word vectors.
      
      Author: Junyang <fly.shenjy@gmail.com>
      Author: flyskyfly <fly.shenjy@gmail.com>
      
      Closes #11812 from flyjy/TVec.
      1192fe4c
    • Xiangrui Meng's avatar
      [SPARK-14412][.2][ML] rename *RDDStorageLevel to *StorageLevel in ml.ALS · 7fbe1bb2
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      As discussed in #12660, this PR renames
      * intermediateRDDStorageLevel -> intermediateStorageLevel
      * finalRDDStorageLevel -> finalStorageLevel
      
      The argument name in `ALS.train` will be addressed in SPARK-15027.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12803 from mengxr/SPARK-14412.
      7fbe1bb2
    • Sean Owen's avatar
      [SPARK-14533][MLLIB] RowMatrix.computeCovariance inaccurate when values are... · 5886b621
      Sean Owen authored
      [SPARK-14533][MLLIB] RowMatrix.computeCovariance inaccurate when values are very large (partial fix)
      
      ## What changes were proposed in this pull request?
      
      Fix for part of SPARK-14533: trivial simplification and more accurate computation of column means. See also https://github.com/apache/spark/pull/12299 which contained a complete fix that was very slow. This PR does _not_ resolve SPARK-14533 entirely.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #12779 from srowen/SPARK-14533.2.
      5886b621
    • Xiangrui Meng's avatar
      [SPARK-14850][.2][ML] use UnsafeArrayData.fromPrimitiveArray in ml.VectorUDT/MatrixUDT · 3d09ceee
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      This PR uses `UnsafeArrayData.fromPrimitiveArray` to implement `ml.VectorUDT/MatrixUDT` to avoid boxing/unboxing.
      
      ## How was this patch tested?
      
      Exiting unit tests.
      
      cc: cloud-fan
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12805 from mengxr/SPARK-14850.
      3d09ceee
    • Wenchen Fan's avatar
      [SPARK-14850][ML] convert primitive array from/to unsafe array directly in VectorUDT/MatrixUDT · 43b149fb
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This PR adds `fromPrimitiveArray` and `toPrimitiveArray` in `UnsafeArrayData`, so that we can do the conversion much faster in VectorUDT/MatrixUDT.
      
      ## How was this patch tested?
      
      existing tests and new test suite `UnsafeArraySuite`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #12640 from cloud-fan/ml.
      43b149fb
    • Nick Pentreath's avatar
      [SPARK-14412][ML][PYSPARK] Add StorageLevel params to ALS · 90fa2c6e
      Nick Pentreath authored
      `mllib` `ALS` supports `setIntermediateRDDStorageLevel` and `setFinalRDDStorageLevel`. This PR adds these as Params in `ml` `ALS`. They are put in group **expertParam** since few users will need them.
      
      ## How was this patch tested?
      
      New test cases in `ALSSuite` and `tests.py`.
      
      cc yanboliang jkbradley sethah rishabhbhardwaj
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #12660 from MLnick/SPARK-14412-als-storage-params.
      90fa2c6e
  8. Apr 29, 2016
  9. Apr 28, 2016
    • Yin Huai's avatar
    • Joseph K. Bradley's avatar
      [SPARK-14862][ML] Updated Classifiers to not require labelCol metadata · 4f4721a2
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Updated Classifier, DecisionTreeClassifier, RandomForestClassifier, GBTClassifier to not require input column metadata.
      * They first check for metadata.
      * If numClasses is not specified in metadata, they identify the largest label value (up to a limit).
      
      This functionality is implemented in a new Classifier.getNumClasses method.
      
      Also
      * Updated Classifier.extractLabeledPoints to (a) check label values and (b) include a second version which takes a numClasses value for validity checking.
      
      ## How was this patch tested?
      
      * Unit tests in ClassifierSuite for helper methods
      * Unit tests for DecisionTreeClassifier, RandomForestClassifier, GBTClassifier with toy datasets lacking label metadata
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12663 from jkbradley/trees-no-metadata.
      4f4721a2
    • Pravin Gadakh's avatar
      [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local · dae538a4
      Pravin Gadakh authored
      ## What changes were proposed in this pull request?
      
      This PR adds `since` tag into the matrix and vector classes in spark-mllib-local.
      
      ## How was this patch tested?
      
      Scala-style checks passed.
      
      Author: Pravin Gadakh <prgadakh@in.ibm.com>
      
      Closes #12416 from pravingadakh/SPARK-14613.
      dae538a4
    • Yuhao Yang's avatar
      [SPARK-14916][MLLIB] A more friendly tostring for FreqItemset in mllib.fpm · d5ab42ce
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      jira: https://issues.apache.org/jira/browse/SPARK-14916
      FreqItemset as the result of FPGrowth should have a more friendly toString(), to help users and developers.
      sample:
      {a, b}: 5
      {x, y, z}: 4
      
      ## How was this patch tested?
      
      existing unit tests.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #12698 from hhbyyh/freqtos.
      d5ab42ce
    • Joseph K. Bradley's avatar
      [SPARK-14852][ML] refactored GLM summary into training, non-training summaries · 5ee72454
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      This splits GeneralizedLinearRegressionSummary into 2 summary types:
      * GeneralizedLinearRegressionSummary, which does not store info from fitting (diagInvAtWA)
      * GeneralizedLinearRegressionTrainingSummary, which is a subclass of GeneralizedLinearRegressionSummary and stores info from fitting
      
      This also add a method evaluate() which can produce a GeneralizedLinearRegressionSummary on a new dataset.
      
      The summary no longer provides the model itself as a public val.
      
      Also:
      * Fixes bug where GeneralizedLinearRegressionTrainingSummary was created with model, not summaryModel.
      * Adds hasSummary method.
      * Renames findSummaryModelAndPredictionCol -> getSummaryModel and simplifies that method.
      * In summary, extract values from model immediately in case user later changes those (e.g., predictionCol).
      * Pardon the style fixes; that is IntelliJ being obnoxious.
      
      ## How was this patch tested?
      
      Existing unit tests + updated test for evaluate and hasSummary
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12624 from jkbradley/model-summary-api.
      5ee72454
    • Liang-Chi Hsieh's avatar
      [SPARK-14487][SQL] User Defined Type registration without SQLUserDefinedType annotation · 7c6937a8
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Currently we use `SQLUserDefinedType` annotation to register UDTs for user classes. However, by doing this, we add Spark dependency to user classes.
      
      For some user classes, it is unnecessary to add such dependency that will increase deployment difficulty.
      
      We should provide alternative approach to register UDTs for user classes without `SQLUserDefinedType` annotation.
      
      ## How was this patch tested?
      
      `UserDefinedTypeSuite`
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #12259 from viirya/improve-sql-usertype.
      7c6937a8
  10. Apr 27, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14671][ML] Pipeline setStages should handle subclasses of PipelineStage · f5ebb18c
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Pipeline.setStages failed for some code examples which worked in 1.5 but fail in 1.6.  This tends to occur when using a mix of transformers from ml.feature. It is because Java Arrays are non-covariant and the addition of MLWritable to some transformers means the stages0/1 arrays above are not of type Array[PipelineStage].  This PR modifies the following to accept subclasses of PipelineStage:
      * Pipeline.setStages()
      * Params.w()
      
      ## How was this patch tested?
      
      Unit test which fails to compile before this fix.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12430 from jkbradley/pipeline-setstages.
      f5ebb18c
    • Yanbo Liang's avatar
      [SPARK-14899][ML][PYSPARK] Remove spark.ml HashingTF hashingAlg option · 4672e983
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Since [SPARK-10574](https://issues.apache.org/jira/browse/SPARK-10574) breaks behavior of ```HashingTF```, we should try to enforce good practice by removing the "native" hashAlgorithm option in spark.ml and pyspark.ml. We can leave spark.mllib and pyspark.mllib alone.
      
      ## How was this patch tested?
      Unit tests.
      
      cc jkbradley
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #12702 from yanboliang/spark-14899.
      4672e983
    • Mike Dusenberry's avatar
      [SPARK-9656][MLLIB][PYTHON] Add missing methods to PySpark's Distributed Linear Algebra Classes · 607f5034
      Mike Dusenberry authored
      This PR adds the remaining group of methods to PySpark's distributed linear algebra classes as follows:
      
      * `RowMatrix` <sup>**[1]**</sup>
        1. `computeGramianMatrix`
        2. `computeCovariance`
        3. `computeColumnSummaryStatistics`
        4. `columnSimilarities`
        5. `tallSkinnyQR` <sup>**[2]**</sup>
      * `IndexedRowMatrix` <sup>**[3]**</sup>
        1. `computeGramianMatrix`
      * `CoordinateMatrix`
        1. `transpose`
      * `BlockMatrix`
        1. `validate`
        2. `cache`
        3. `persist`
        4. `transpose`
      
      **[1]**: Note: `multiply`, `computeSVD`, and `computePrincipalComponents` are already part of PR #7963 for SPARK-6227.
      **[2]**: Implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor.  As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark.  Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`.  Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`.  As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type.  Thus, this PR currently contains that fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`.  `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types.  However, this fix may be out of scope for this single PR, and it may be better suited in a separate JIRA/PR.  Therefore, I have marked this PR as WIP and am open to discussion.
      **[3]**: Note: `multiply` and `computeSVD` are already part of PR #7963 for SPARK-6227.
      
      Author: Mike Dusenberry <mwdusenb@us.ibm.com>
      
      Closes #9441 from dusenberrymw/SPARK-9656_Add_Missing_Methods_to_PySpark_Distributed_Linear_Algebra.
      607f5034
  11. Apr 26, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local · bd2c9a6d
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Before, spark.ml GaussianMixtureModel used the spark.mllib MultivariateGaussian in its public API.  This was added after 1.6, so we can modify this API without breaking APIs.
      
      This PR copies MultivariateGaussian to mllib-local in spark.ml, with a few changes:
      * Renamed fields to match numpy, scipy: mu => mean, sigma => cov
      
      This PR then uses the spark.ml MultivariateGaussian in the spark.ml GaussianMixtureModel, which involves:
      * Modifying the constructor
      * Adding a computeProbabilities method
      
      Also:
      * Added EPSILON to mllib-local for use in MultivariateGaussian
      
      ## How was this patch tested?
      
      Existing unit tests
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12593 from jkbradley/sparkml-gmm-fix.
      bd2c9a6d
Loading