-
- Downloads
[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
Currently, the LogLoss used by GradientBoostedTrees has 2 issues: * the gradient (and therefore loss) does not match that used by Friedman (1999) * the error computation uses 0/1 accuracy, not log loss This PR updates LogLoss. It also adds some doc for boosting and forests. I tested it on sample data and made sure the log loss is monotonically decreasing with each boosting iteration. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Closes #3439 from jkbradley/gbt-loss-fix and squashes the following commits: cfec17e [Joseph K. Bradley] removed forgotten temp comments a27eb6d [Joseph K. Bradley] corrections to last log loss commit ed5da2c [Joseph K. Bradley] updated LogLoss (boosting) for numerical stability 5e52bff [Joseph K. Bradley] * Removed the 1/2 from SquaredError. This also required updating the test suite since it effectively doubles the gradient and loss. * Added doc for developers within RandomForest. * Small cleanup in test suite (generating data only once) e57897a [Joseph K. Bradley] Fixed LogLoss for GradientBoostedTrees, and updated doc for losses, forests, and boosting
Showing
- mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala 10 additions, 8 deletions...la/org/apache/spark/mllib/tree/GradientBoostedTrees.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala 43 additions, 1 deletion...main/scala/org/apache/spark/mllib/tree/RandomForest.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/loss/AbsoluteError.scala 12 additions, 14 deletions...cala/org/apache/spark/mllib/tree/loss/AbsoluteError.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala 22 additions, 12 deletions...main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala 10 additions, 12 deletions...scala/org/apache/spark/mllib/tree/loss/SquaredError.scala
- mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala 49 additions, 25 deletions...g/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala
Please register or sign in to comment