Skip to content
Snippets Groups Projects
  • qiping.lqp's avatar
    fdb302f4
    [SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params... · fdb302f4
    qiping.lqp authored
    [SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params to example and Python API
    
    Added minInstancesPerNode, minInfoGain params to:
    * DecisionTreeRunner.scala example
    * Python API (tree.py)
    
    Also:
    * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements"
    * small style fixes
    
    CC: mengxr
    
    Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
    Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
    Author: chouqin <liqiping1991@gmail.com>
    
    Closes #2349 from jkbradley/chouqin-dt-preprune and squashes the following commits:
    
    61b2e72 [Joseph K. Bradley] Added max of 10GB for maxMemoryInMB in Strategy.
    a95e7c8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune
    95c479d [Joseph K. Bradley] * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements" * small style fixes
    e2628b6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune
    19b01af [Joseph K. Bradley] Merge remote-tracking branch 'chouqin/dt-preprune' into chouqin-dt-preprune
    f1d11d1 [chouqin] fix typo
    c7ebaf1 [chouqin] fix typo
    39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test
    c6e2dfc [Joseph K. Bradley] Added minInstancesPerNode and minInfoGain parameters to DecisionTreeRunner.scala and to Python API in tree.py
    0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree
    d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1
    efcc736 [qiping.lqp] fix bug
    10b8012 [qiping.lqp] fix style
    6728fad [qiping.lqp] minor fix: remove empty lines
    bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark into dt-preprune
    cadd569 [qiping.lqp] add api docs
    46b891f [qiping.lqp] fix bug
    e72c7e4 [qiping.lqp] add comments
    845c6fa [qiping.lqp] fix style
    f195e83 [qiping.lqp] fix style
    987cbf4 [qiping.lqp] fix bug
    ff34845 [qiping.lqp] separate calculation of predict of node from calculation of info gain
    ac42378 [qiping.lqp] add min info gain and min instances per node parameters in decision tree
    fdb302f4
    History
    [SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params...
    qiping.lqp authored
    [SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params to example and Python API
    
    Added minInstancesPerNode, minInfoGain params to:
    * DecisionTreeRunner.scala example
    * Python API (tree.py)
    
    Also:
    * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements"
    * small style fixes
    
    CC: mengxr
    
    Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
    Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
    Author: chouqin <liqiping1991@gmail.com>
    
    Closes #2349 from jkbradley/chouqin-dt-preprune and squashes the following commits:
    
    61b2e72 [Joseph K. Bradley] Added max of 10GB for maxMemoryInMB in Strategy.
    a95e7c8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune
    95c479d [Joseph K. Bradley] * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements" * small style fixes
    e2628b6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune
    19b01af [Joseph K. Bradley] Merge remote-tracking branch 'chouqin/dt-preprune' into chouqin-dt-preprune
    f1d11d1 [chouqin] fix typo
    c7ebaf1 [chouqin] fix typo
    39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test
    c6e2dfc [Joseph K. Bradley] Added minInstancesPerNode and minInfoGain parameters to DecisionTreeRunner.scala and to Python API in tree.py
    0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree
    d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1
    efcc736 [qiping.lqp] fix bug
    10b8012 [qiping.lqp] fix style
    6728fad [qiping.lqp] minor fix: remove empty lines
    bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark into dt-preprune
    cadd569 [qiping.lqp] add api docs
    46b891f [qiping.lqp] fix bug
    e72c7e4 [qiping.lqp] add comments
    845c6fa [qiping.lqp] fix style
    f195e83 [qiping.lqp] fix style
    987cbf4 [qiping.lqp] fix bug
    ff34845 [qiping.lqp] separate calculation of predict of node from calculation of info gain
    ac42378 [qiping.lqp] add min info gain and min instances per node parameters in decision tree