-
- Downloads
[SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference...
[SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference category when encoding string terms ## What changes were proposed in this pull request? Please see [SPARK-14657](https://issues.apache.org/jira/browse/SPARK-14657) for detail of this bug. I searched online and test some other cases, found when we fit R glm model(or other models powered by R formula) w/o intercept on a dataset including string/category features, one of the categories in the first category feature is being used as reference category, we will not drop any category for that feature. I think we should keep consistent semantics between Spark RFormula and R formula. ## How was this patch tested? Add standard unit tests. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #12414 from yanboliang/spark-14657.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 9 additions, 1 deletion...src/main/scala/org/apache/spark/ml/feature/RFormula.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala 83 additions, 0 deletions...est/scala/org/apache/spark/ml/feature/RFormulaSuite.scala
Loading
Please register or sign in to comment