Commit f47700c9 authored 8 years ago by Wayne Zhang Committed by Yanbo Liang 8 years ago

[SPARK-14659][ML] RFormula consistent with R when handling strings

## What changes were proposed in this pull request?
When handling strings, the category dropped by RFormula and R are different:
- RFormula drops the least frequent level
- R drops the first level after ascending alphabetical ordering

This PR supports different string ordering types in StringIndexer #17879 so that RFormula can drop the same level as R when handling strings using`stringOrderType = "alphabetDesc"`.

## How was this patch tested?
new tests

Author: Wayne Zhang <actuaryzhang@uber.com>

Closes #17967 from actuaryzhang/RFormula.

parent 2dbe0c52

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 129 additions and 3 deletions

Please register or to comment