-
- Downloads
[SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance generate negative result
## What changes were proposed in this pull request? Because of numerical error, MultivariateOnlineSummarizer.variance is possible to generate negative variance. **This is a serious bug because many algos in MLLib** **use stddev computed from** `sqrt(variance)` **it will generate NaN and crash the whole algorithm.** we can reproduce this bug use the following code: ``` val summarizer1 = (new MultivariateOnlineSummarizer) .add(Vectors.dense(3.0), 0.7) val summarizer2 = (new MultivariateOnlineSummarizer) .add(Vectors.dense(3.0), 0.4) val summarizer3 = (new MultivariateOnlineSummarizer) .add(Vectors.dense(3.0), 0.5) val summarizer4 = (new MultivariateOnlineSummarizer) .add(Vectors.dense(3.0), 0.4) val summarizer = summarizer1 .merge(summarizer2) .merge(summarizer3) .merge(summarizer4) println(summarizer.variance(0)) ``` This PR fix the bugs in `mllib.stat.MultivariateOnlineSummarizer.variance` and `ml.stat.SummarizerBuffer.variance`, and several places in `WeightedLeastSquares` ## How was this patch tested? test cases added. Author: WeichenXu <WeichenXu123@outlook.com> Closes #19029 from WeichenXu123/fix_summarizer_var_bug.
Showing
- mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala 9 additions, 3 deletions...cala/org/apache/spark/ml/optim/WeightedLeastSquares.scala
- mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala 3 additions, 2 deletions.../src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
- mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala 3 additions, 2 deletions...pache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
- mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala 18 additions, 0 deletions...test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizerSuite.scala 18 additions, 0 deletions.../spark/mllib/stat/MultivariateOnlineSummarizerSuite.scala
Please register or sign in to comment