-
- Downloads
[SPARK-4580] [SPARK-4610] [mllib] [docs] Documentation for tree ensembles + DecisionTree API fix
Major changes: * Added programming guide sections for tree ensembles * Added examples for tree ensembles * Updated DecisionTree programming guide with more info on parameters * **API change**: Standardized the tree parameter for the number of classes (for classification) Minor changes: * Updated decision tree documentation * Updated existing tree and tree ensemble examples * Use train/test split, and compute test error instead of training error. * Fixed decision_tree_runner.py to actually use the number of classes it computes from data. (small bug fix) Note: I know this is a lot of lines, but most is covered by: * Programming guide sections for gradient boosting and random forests. (The changes are probably best viewed by generating the docs locally.) * New examples (which were copied from the programming guide) * The "numClasses" renaming I have run all examples and relevant unit tests. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #3461 from jkbradley/ensemble-docs and squashes the following commits: 70a75f3 [Joseph K. Bradley] updated forest vs boosting comparison d1de753 [Joseph K. Bradley] Added note about toString and toDebugString for DecisionTree to migration guide 8e87f8f [Joseph K. Bradley] Combined GBT and RandomForest guides into one ensembles guide 6fab846 [Joseph K. Bradley] small fixes based on review b9f8576 [Joseph K. Bradley] updated decision tree doc 375204c [Joseph K. Bradley] fixed python style 2b60b6e [Joseph K. Bradley] merged Java RandomForest examples into 1 file. added header. Fixed small bug in same example in the programming guide. 706d332 [Joseph K. Bradley] updated python DT runner to print full model if it is small c76c823 [Joseph K. Bradley] added migration guide for mllib abe5ed7 [Joseph K. Bradley] added examples for random forest in Java and Python to examples folder 07fc11d [Joseph K. Bradley] Renamed numClassesForClassification to numClasses everywhere in trees and ensembles. This is a breaking API change, but it was necessary to correct an API inconsistency in Spark 1.1 (where Python DecisionTree used numClasses but Scala used numClassesForClassification). cdfdfbc [Joseph K. Bradley] added examples for GBT 6372a2b [Joseph K. Bradley] updated decision tree examples to use random split. tested all of them. ad3e695 [Joseph K. Bradley] added gbt and random forest to programming guide. still need to update their examples
Showing
- docs/mllib-decision-tree.md 144 additions, 97 deletionsdocs/mllib-decision-tree.md
- docs/mllib-ensembles.md 653 additions, 0 deletionsdocs/mllib-ensembles.md
- docs/mllib-guide.md 28 additions, 1 deletiondocs/mllib-guide.md
- examples/src/main/java/org/apache/spark/examples/mllib/JavaGradientBoostedTreesRunner.java 1 addition, 1 deletion.../spark/examples/mllib/JavaGradientBoostedTreesRunner.java
- examples/src/main/java/org/apache/spark/examples/mllib/JavaRandomForestExample.java 139 additions, 0 deletions.../apache/spark/examples/mllib/JavaRandomForestExample.java
- examples/src/main/python/mllib/decision_tree_runner.py 10 additions, 7 deletionsexamples/src/main/python/mllib/decision_tree_runner.py
- examples/src/main/python/mllib/random_forest_example.py 89 additions, 0 deletionsexamples/src/main/python/mllib/random_forest_example.py
- examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala 1 addition, 1 deletion.../org/apache/spark/examples/mllib/DecisionTreeRunner.scala
- examples/src/main/scala/org/apache/spark/examples/mllib/GradientBoostedTreesRunner.scala 1 addition, 1 deletion...che/spark/examples/mllib/GradientBoostedTreesRunner.scala
- mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala 2 additions, 2 deletions...la/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala 11 additions, 11 deletions...main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala 10 additions, 10 deletions...main/scala/org/apache/spark/mllib/tree/RandomForest.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala 3 additions, 3 deletions...che/spark/mllib/tree/configuration/BoostingStrategy.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 13 additions, 13 deletions.../org/apache/spark/mllib/tree/configuration/Strategy.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala 1 addition, 1 deletion...g/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
- mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala 23 additions, 23 deletions...scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala 1 addition, 1 deletion...g/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/tree/RandomForestSuite.scala 7 additions, 7 deletions...scala/org/apache/spark/mllib/tree/RandomForestSuite.scala
- python/pyspark/mllib/tree.py 3 additions, 3 deletionspython/pyspark/mllib/tree.py
Loading
Please register or sign in to comment