-
- Downloads
[SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum...
[SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum instances per node as training parameters for decision tree. These two parameters can act as early stop rules to do pre-pruning. When a split cause cause left or right child to have less than `minInstancesPerNode` or has less information gain than `minInfoGain`, current node will not be split by this split. When there is no possible splits that satisfy requirements, there is no useful information gain stats, but we still need to calculate the predict value for current node. So I separated calculation of predict from calculation of information gain, which can also save computation when the number of possible splits is large. Please see [SPARK-3272](https://issues.apache.org/jira/browse/SPARK-3272) for more details. CC: mengxr manishamde jkbradley, please help me review this, thanks. Author: qiping.lqp <qiping.lqp@alibaba-inc.com> Author: chouqin <liqiping1991@gmail.com> Closes #2332 from chouqin/dt-preprune and squashes the following commits: f1d11d1 [chouqin] fix typo c7ebaf1 [chouqin] fix typo 39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test 0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1 efcc736 [qiping.lqp] fix bug 10b8012 [qiping.lqp] fix style 6728fad [qiping.lqp] minor fix: remove empty lines bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark into dt-preprune cadd569 [qiping.lqp] add api docs 46b891f [qiping.lqp] fix bug e72c7e4 [qiping.lqp] add comments 845c6fa [qiping.lqp] fix style f195e83 [qiping.lqp] fix style 987cbf4 [qiping.lqp] fix bug ff34845 [qiping.lqp] separate calculation of predict of node from calculation of info gain ac42378 [qiping.lqp] add min info gain and min instances per node parameters in decision tree
Showing
- mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala 53 additions, 19 deletions...main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 9 additions, 0 deletions.../org/apache/spark/mllib/tree/configuration/Strategy.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala 5 additions, 2 deletions...g/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/model/InformationGainStats.scala 13 additions, 7 deletions.../apache/spark/mllib/tree/model/InformationGainStats.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/model/Predict.scala 36 additions, 0 deletions...ain/scala/org/apache/spark/mllib/tree/model/Predict.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/model/Split.scala 2 additions, 0 deletions.../main/scala/org/apache/spark/mllib/tree/model/Split.scala
- mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala 95 additions, 8 deletions...scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala
Loading
Please register or sign in to comment