-
- Downloads
[SPARK-17455][MLLIB] Improve PAVA implementation in IsotonicRegression
## What changes were proposed in this pull request? New implementation of the Pool Adjacent Violators Algorithm (PAVA) in mllib.IsotonicRegression, which used under the hood by ml.regression.IsotonicRegression. The previous implementation could have factorial complexity in the worst case. This implementation, which closely follows those in scikit-learn and the R `iso` package, runs in quadratic time in the worst case. ## How was this patch tested? Existing unit tests in both `mllib` and `ml` passed before and after this patch. Scaling properties were tested by running the `poolAdjacentViolators` method in [scala-benchmarking-template](https://github.com/sirthias/scala-benchmarking-template) with the input generated by ``` scala val x = (1 to length).toArray.map(_.toDouble) val y = x.reverse.zipWithIndex.map{ case (yi, i) => if (i % 2 == 1) yi - 1.5 else yi} val w = Array.fill(length)(1d) val input: Array[(Double, Double, Double)] = (y zip x zip w) map{ case ((y, x), w) => (y, x, w)} ``` Before this patch: | Input Length | Time (us) | | --: | --: | | 100 | 1.35 | | 200 | 3.14 | | 400 | 116.10 | | 800 | 2134225.90 | After this patch: | Input Length | Time (us) | | --: | --: | | 100 | 1.25 | | 200 | 2.53 | | 400 | 5.86 | | 800 | 10.55 | Benchmarking was also performed with randomly-generated y values, with similar results. Author: z001qdp <Nicholas.Eggert@target.com> Closes #15018 from neggert/SPARK-17455-isoreg-algo.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala 89 additions, 62 deletions...rg/apache/spark/mllib/regression/IsotonicRegression.scala
- mllib/src/test/scala/org/apache/spark/mllib/regression/IsotonicRegressionSuite.scala 8 additions, 9 deletions...ache/spark/mllib/regression/IsotonicRegressionSuite.scala
Loading
Please register or sign in to comment