-
- Downloads
[SPARK-2197] [mllib] Java DecisionTree bug fix and easy-of-use
Bug fix: Before, when an RDD was created in Java and passed to DecisionTree.train(), the fake class tag caused problems. * Fix: DecisionTree: Used new RDD.retag() method to allow passing RDDs from Java. Other improvements to Decision Trees for easy-of-use with Java: * impurity classes: Added instance() methods to help with Java interface. * Strategy: Added Java-friendly constructor --> Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently. I suspect we will redo the API before the other options are included. CC: mengxr Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #1740 from jkbradley/dt-java-new and squashes the following commits: 0805dc6 [Joseph K. Bradley] Changed Strategy to use JavaConverters instead of JavaConversions 519b1b7 [Joseph K. Bradley] * Organized imports in JavaDecisionTreeSuite.java * Using JavaConverters instead of JavaConversions in DecisionTreeSuite.scala f7b5ca1 [Joseph K. Bradley] Improvements to make it easier to run DecisionTree from Java. * DecisionTree: Used new RDD.retag() method to allow passing RDDs from Java. * impurity classes: Added instance() methods to help with Java interface. * Strategy: Added Java-friendly constructor ** Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently. I suspect we will redo the API before the other options are included. d78ada6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-java 320853f [Joseph K. Bradley] Added JavaDecisionTreeSuite, partly written 13a585e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-java f1a8283 [Joseph K. Bradley] Added old JavaDecisionTreeSuite, to be updated later 225822f [Joseph K. Bradley] Bug: In DecisionTree, the method sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is the bound for unordered categorical features, not ordered ones. The upper bound should be the arity (i.e., max value) of the feature.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala 4 additions, 4 deletions...main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 29 additions, 0 deletions.../org/apache/spark/mllib/tree/configuration/Strategy.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala 7 additions, 0 deletions.../scala/org/apache/spark/mllib/tree/impurity/Entropy.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala 7 additions, 0 deletions...ain/scala/org/apache/spark/mllib/tree/impurity/Gini.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala 7 additions, 0 deletions...scala/org/apache/spark/mllib/tree/impurity/Variance.scala
- mllib/src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java 102 additions, 0 deletions...va/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java
- mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala 6 additions, 0 deletions...scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala
Loading
Please register or sign in to comment