-
- Downloads
[SPARK-16426][MLLIB] Fix bug that caused NaNs in IsotonicRegression
## What changes were proposed in this pull request? Fixed a bug that caused `NaN`s in `IsotonicRegression`. The problem occurs when training rows with the same feature value but different labels end up on different partitions. This patch changes a `sortBy` call to a `partitionBy(RangePartitioner)` followed by a `mapPartitions(sortBy)` in order to ensure that all rows with the same feature value end up on the same partition. ## How was this patch tested? Added a unit test. Author: z001qdp <Nicholas.Eggert@target.com> Closes #14140 from neggert/SPARK-16426-isotonic-nan.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala 6 additions, 3 deletions...rg/apache/spark/mllib/regression/IsotonicRegression.scala
- mllib/src/test/scala/org/apache/spark/mllib/regression/IsotonicRegressionSuite.scala 11 additions, 0 deletions...ache/spark/mllib/regression/IsotonicRegressionSuite.scala
Please register or sign in to comment