-
- Downloads
[SPARK-15957][ML] RFormula supports forcing to index label
## What changes were proposed in this pull request? ```RFormula``` will index label only when it is string type currently. If the label is numeric type and we use ```RFormula``` to present a classification model, there is no label attributes in label column metadata. The label attributes are useful when making prediction for classification, so we can force to index label by ```StringIndexer``` whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from label column metadata successfully. This feature can help us to fix bug similar with [SPARK-15153](https://issues.apache.org/jira/browse/SPARK-15153). For regression, we will still to keep label as numeric type. In this PR, we add a param ```indexLabel``` to control whether to force to index label for ```RFormula```. ## How was this patch tested? Unit tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13675 from yanboliang/spark-15957.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 26 additions, 3 deletions...src/main/scala/org/apache/spark/ml/feature/RFormula.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala 26 additions, 1 deletion...est/scala/org/apache/spark/ml/feature/RFormulaSuite.scala
Loading
Please register or sign in to comment