-
- Downloads
[SPARK-17768][CORE] Small (Sum,Count,Mean)Evaluator problems and suboptimalities
## What changes were proposed in this pull request? Fix: - GroupedMeanEvaluator and GroupedSumEvaluator are unused, as is the StudentTCacher support class - CountEvaluator can return a lower bound < 0, when counts can't be negative - MeanEvaluator will actually fail on exactly 1 datum (yields t-test with 0 DOF) - CountEvaluator uses a normal distribution, which may be an inappropriate approximation (leading to above) - Test for SumEvaluator asserts incorrect expected sums – e.g. after observing 10% of data has sum of 2, expectation should be 20, not 38 - CountEvaluator, MeanEvaluator have no unit tests to catch these - Duplication of distribution code across CountEvaluator, GroupedCountEvaluator - The stats in each could use a bit of documentation as I had to guess at them - (Code could use a few cleanups and optimizations too) ## How was this patch tested? Existing and new tests Author: Sean Owen <sowen@cloudera.com> Closes #15341 from srowen/SPARK-17768.
Showing
- core/src/main/scala/org/apache/spark/partial/CountEvaluator.scala 36 additions, 17 deletions.../main/scala/org/apache/spark/partial/CountEvaluator.scala
- core/src/main/scala/org/apache/spark/partial/GroupedCountEvaluator.scala 5 additions, 25 deletions...cala/org/apache/spark/partial/GroupedCountEvaluator.scala
- core/src/main/scala/org/apache/spark/partial/GroupedSumEvaluator.scala 0 additions, 88 deletions.../scala/org/apache/spark/partial/GroupedSumEvaluator.scala
- core/src/main/scala/org/apache/spark/partial/MeanEvaluator.scala 14 additions, 9 deletions...c/main/scala/org/apache/spark/partial/MeanEvaluator.scala
- core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala 22 additions, 11 deletions...rc/main/scala/org/apache/spark/partial/SumEvaluator.scala
- core/src/test/scala/org/apache/spark/partial/CountEvaluatorSuite.scala 43 additions, 0 deletions.../scala/org/apache/spark/partial/CountEvaluatorSuite.scala
- core/src/test/scala/org/apache/spark/partial/MeanEvaluatorSuite.scala 57 additions, 0 deletions...t/scala/org/apache/spark/partial/MeanEvaluatorSuite.scala
- core/src/test/scala/org/apache/spark/partial/SumEvaluatorSuite.scala 26 additions, 56 deletions...st/scala/org/apache/spark/partial/SumEvaluatorSuite.scala
Please register or sign in to comment