-
- Downloads
[SPARK-14159][ML] Fixed bug in StringIndexer + related issue in RFormula
## What changes were proposed in this pull request? StringIndexerModel.transform sets the output column metadata to use name inputCol. It should not. Fixing this causes a problem with the metadata produced by RFormula. Fix in RFormula: I added the StringIndexer columns to prefixesToRewrite, and I modified VectorAttributeRewriter to find and replace all "prefixes" since attributes collect multiple prefixes from StringIndexer + Interaction. Note that "prefixes" is no longer accurate since internal strings may be replaced. ## How was this patch tested? Unit test which failed before this fix. Author: Joseph K. Bradley <joseph@databricks.com> Closes #11965 from jkbradley/StringIndexer-fix.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 6 additions, 9 deletions...src/main/scala/org/apache/spark/ml/feature/RFormula.scala
- mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala 3 additions, 4 deletions...ain/scala/org/apache/spark/ml/feature/StringIndexer.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala 13 additions, 0 deletions...cala/org/apache/spark/ml/feature/StringIndexerSuite.scala
Loading
Please register or sign in to comment