Skip to content
Snippets Groups Projects
Commit 0c8444cf authored by Yanbo Liang's avatar Yanbo Liang
Browse files

[SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference...

[SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference category when encoding string terms

## What changes were proposed in this pull request?

Please see [SPARK-14657](https://issues.apache.org/jira/browse/SPARK-14657) for detail of this bug.
I searched online and test some other cases, found when we fit R glm model(or other models powered by R formula) w/o intercept on a dataset including string/category features, one of the categories in the first category feature is being used as reference category, we will not drop any category for that feature.
I think we should keep consistent semantics between Spark RFormula and R formula.
## How was this patch tested?

Add standard unit tests.

cc mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12414 from yanboliang/spark-14657.
parent 376d90d5
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment