-
- Downloads
[SPARK-4439] [MLlib] add python api for random forest
``` class RandomForestModel | A model trained by RandomForest | | numTrees(self) | Get number of trees in forest. | | predict(self, x) | Predict values for a single data point or an RDD of points using the model trained. | | toDebugString(self) | Full model | | totalNumNodes(self) | Get total number of nodes, summed over all trees in the forest. | class RandomForest | trainClassifier(cls, data, numClassesForClassification, categoricalFeaturesInfo, numTrees, featureSubsetStrategy='auto', impurity='gini', maxDepth=4, maxBins=32, seed=None): | Method to train a decision tree model for binary or multiclass classification. | | :param data: Training dataset: RDD of LabeledPoint. | Labels should take values {0, 1, ..., numClasses-1}. | :param numClassesForClassification: number of classes for classification. | :param categoricalFeaturesInfo: Map storing arity of categorical features. | E.g., an entry (n -> k) indicates that feature n is categorical | with k categories indexed from 0: {0, 1, ..., k-1}. | :param numTrees: Number of trees in the random forest. | :param featureSubsetStrategy: Number of features to consider for splits at each node. | Supported: "auto" (default), "all", "sqrt", "log2", "onethird". | If "auto" is set, this parameter is set based on numTrees: | if numTrees == 1, set to "all"; | if numTrees > 1 (forest) set to "sqrt". | :param impurity: Criterion used for information gain calculation. | Supported values: "gini" (recommended) or "entropy". | :param maxDepth: Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means | 1 internal node + 2 leaf nodes. (default: 4) | :param maxBins: maximum number of bins used for splitting features (default: 100) | :param seed: Random seed for bootstrapping and choosing feature subsets. | :return: RandomForestModel that can be used for prediction | | trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetStrategy='auto', impurity='variance', maxDepth=4, maxBins=32, seed=None): | Method to train a decision tree model for regression. | | :param data: Training dataset: RDD of LabeledPoint. | Labels are real numbers. | :param categoricalFeaturesInfo: Map storing arity of categorical features. | E.g., an entry (n -> k) indicates that feature n is categorical | with k categories indexed from 0: {0, 1, ..., k-1}. | :param numTrees: Number of trees in the random forest. | :param featureSubsetStrategy: Number of features to consider for splits at each node. | Supported: "auto" (default), "all", "sqrt", "log2", "onethird". | If "auto" is set, this parameter is set based on numTrees: | if numTrees == 1, set to "all"; | if numTrees > 1 (forest) set to "onethird". | :param impurity: Criterion used for information gain calculation. | Supported values: "variance". | :param maxDepth: Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means | 1 internal node + 2 leaf nodes.(default: 4) | :param maxBins: maximum number of bins used for splitting features (default: 100) | :param seed: Random seed for bootstrapping and choosing feature subsets. | :return: RandomForestModel that can be used for prediction | ``` Author: Davies Liu <davies@databricks.com> Closes #3320 from davies/forest and squashes the following commits: 8003dfc [Davies Liu] reorder 53cf510 [Davies Liu] fix docs 4ca593d [Davies Liu] fix docs e0df852 [Davies Liu] fix docs 0431746 [Davies Liu] rebased 2b6f239 [Davies Liu] Merge branch 'master' of github.com:apache/spark into forest 885abee [Davies Liu] address comments dae7fc0 [Davies Liu] address comments 89a000f [Davies Liu] fix docs 565d476 [Davies Liu] add python api for random forest
Showing
- mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala 36 additions, 2 deletions...la/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
- mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala 4 additions, 8 deletions...main/scala/org/apache/spark/mllib/tree/RandomForest.scala
- python/docs/epytext.py 1 addition, 1 deletionpython/docs/epytext.py
- python/pyspark/mllib/tree.py 220 additions, 22 deletionspython/pyspark/mllib/tree.py
Loading
Please register or sign in to comment