Skip to content
Snippets Groups Projects
  • Joseph K. Bradley's avatar
    115eeb30
    [mllib] DecisionTree: treeAggregate + Python example bug fix · 115eeb30
    Joseph K. Bradley authored
    Small DecisionTree updates:
    * Changed main DecisionTree aggregate to treeAggregate.
    * Fixed bug in python example decision_tree_runner.py with missing argument (since categoricalFeaturesInfo is no longer an optional argument for trainClassifier).
    * Fixed same bug in python doc tests, and added tree.py to doc tests.
    
    CC: mengxr
    
    Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
    
    Closes #2015 from jkbradley/dt-opt2 and squashes the following commits:
    
    b5114fa [Joseph K. Bradley] Fixed python tree.py doc test (extra newline)
    8e4665d [Joseph K. Bradley] Added tree.py to python doc tests.  Fixed bug from missing categoricalFeaturesInfo argument.
    b7b2922 [Joseph K. Bradley] Fixed bug in python example decision_tree_runner.py with missing argument.  Changed main DecisionTree aggregate to treeAggregate.
    85bbc1f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt2
    66d076f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt2
    a0ed0da [Joseph K. Bradley] Renamed DTMetadata to DecisionTreeMetadata.  Small doc updates.
    3726d20 [Joseph K. Bradley] Small code improvements based on code review.
    ac0b9f8 [Joseph K. Bradley] Small updates based on code review. Main change: Now using << instead of math.pow.
    db0d773 [Joseph K. Bradley] scala style fix
    6a38f48 [Joseph K. Bradley] Added DTMetadata class for cleaner code
    931a3a7 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt2
    797f68a [Joseph K. Bradley] Fixed DecisionTreeSuite bug for training second level.  Needed to update treePointToNodeIndex with groupShift.
    f40381c [Joseph K. Bradley] Merge branch 'dt-opt1' into dt-opt2
    5f2dec2 [Joseph K. Bradley] Fixed scalastyle issue in TreePoint
    6b5651e [Joseph K. Bradley] Updates based on code review.  1 major change: persisting to memory + disk, not just memory.
    2d2aaaf [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt1
    26d10dd [Joseph K. Bradley] Removed tree/model/Filter.scala since no longer used.  Removed debugging println calls in DecisionTree.scala.
    356daba [Joseph K. Bradley] Merge branch 'dt-opt1' into dt-opt2
    430d782 [Joseph K. Bradley] Added more debug info on binning error.  Added some docs.
    d036089 [Joseph K. Bradley] Print timing info to logDebug.
    e66f1b1 [Joseph K. Bradley] TreePoint * Updated doc * Made some methods private
    8464a6e [Joseph K. Bradley] Moved TimeTracker to tree/impl/ in its own file, and cleaned it up.  Removed debugging println calls from DecisionTree.  Made TreePoint extend Serialiable
    a87e08f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt1
    c1565a5 [Joseph K. Bradley] Small DecisionTree updates: * Simplification: Updated calculateGainForSplit to take aggregates for a single (feature, split) pair. * Internal doc: findAggForOrderedFeatureClassification
    b914f3b [Joseph K. Bradley] DecisionTree optimization: eliminated filters + small changes
    b2ed1f3 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt
    0f676e2 [Joseph K. Bradley] Optimizations + Bug fix for DecisionTree
    3211f02 [Joseph K. Bradley] Optimizing DecisionTree * Added TreePoint representation to avoid calling findBin multiple times. * (not working yet, but debugging)
    f61e9d2 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-timing
    bcf874a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-timing
    511ec85 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-timing
    a95bc22 [Joseph K. Bradley] timing for DecisionTree internals
    115eeb30
    History
    [mllib] DecisionTree: treeAggregate + Python example bug fix
    Joseph K. Bradley authored
    Small DecisionTree updates:
    * Changed main DecisionTree aggregate to treeAggregate.
    * Fixed bug in python example decision_tree_runner.py with missing argument (since categoricalFeaturesInfo is no longer an optional argument for trainClassifier).
    * Fixed same bug in python doc tests, and added tree.py to doc tests.
    
    CC: mengxr
    
    Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
    
    Closes #2015 from jkbradley/dt-opt2 and squashes the following commits:
    
    b5114fa [Joseph K. Bradley] Fixed python tree.py doc test (extra newline)
    8e4665d [Joseph K. Bradley] Added tree.py to python doc tests.  Fixed bug from missing categoricalFeaturesInfo argument.
    b7b2922 [Joseph K. Bradley] Fixed bug in python example decision_tree_runner.py with missing argument.  Changed main DecisionTree aggregate to treeAggregate.
    85bbc1f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt2
    66d076f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt2
    a0ed0da [Joseph K. Bradley] Renamed DTMetadata to DecisionTreeMetadata.  Small doc updates.
    3726d20 [Joseph K. Bradley] Small code improvements based on code review.
    ac0b9f8 [Joseph K. Bradley] Small updates based on code review. Main change: Now using << instead of math.pow.
    db0d773 [Joseph K. Bradley] scala style fix
    6a38f48 [Joseph K. Bradley] Added DTMetadata class for cleaner code
    931a3a7 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt2
    797f68a [Joseph K. Bradley] Fixed DecisionTreeSuite bug for training second level.  Needed to update treePointToNodeIndex with groupShift.
    f40381c [Joseph K. Bradley] Merge branch 'dt-opt1' into dt-opt2
    5f2dec2 [Joseph K. Bradley] Fixed scalastyle issue in TreePoint
    6b5651e [Joseph K. Bradley] Updates based on code review.  1 major change: persisting to memory + disk, not just memory.
    2d2aaaf [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt1
    26d10dd [Joseph K. Bradley] Removed tree/model/Filter.scala since no longer used.  Removed debugging println calls in DecisionTree.scala.
    356daba [Joseph K. Bradley] Merge branch 'dt-opt1' into dt-opt2
    430d782 [Joseph K. Bradley] Added more debug info on binning error.  Added some docs.
    d036089 [Joseph K. Bradley] Print timing info to logDebug.
    e66f1b1 [Joseph K. Bradley] TreePoint * Updated doc * Made some methods private
    8464a6e [Joseph K. Bradley] Moved TimeTracker to tree/impl/ in its own file, and cleaned it up.  Removed debugging println calls from DecisionTree.  Made TreePoint extend Serialiable
    a87e08f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt1
    c1565a5 [Joseph K. Bradley] Small DecisionTree updates: * Simplification: Updated calculateGainForSplit to take aggregates for a single (feature, split) pair. * Internal doc: findAggForOrderedFeatureClassification
    b914f3b [Joseph K. Bradley] DecisionTree optimization: eliminated filters + small changes
    b2ed1f3 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-opt
    0f676e2 [Joseph K. Bradley] Optimizations + Bug fix for DecisionTree
    3211f02 [Joseph K. Bradley] Optimizing DecisionTree * Added TreePoint representation to avoid calling findBin multiple times. * (not working yet, but debugging)
    f61e9d2 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-timing
    bcf874a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-timing
    511ec85 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-timing
    a95bc22 [Joseph K. Bradley] timing for DecisionTree internals