-
- Downloads
[SPARK-11069][ML] Add RegexTokenizer option to convert to lowercase
jira: https://issues.apache.org/jira/browse/SPARK-11069 quotes from jira: Tokenizer converts strings to lowercase automatically, but RegexTokenizer does not. It would be nice to add an option to RegexTokenizer to convert to lowercase. Proposal: call the Boolean Param "toLowercase" set default to false (so behavior does not change) Actually sklearn converts to lowercase before tokenizing too Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9092 from hhbyyh/tokenLower.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala 17 additions, 2 deletions...rc/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
- mllib/src/test/java/org/apache/spark/ml/feature/JavaTokenizerSuite.java 1 addition, 0 deletions.../java/org/apache/spark/ml/feature/JavaTokenizerSuite.java
- mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala 17 additions, 5 deletions...st/scala/org/apache/spark/ml/feature/TokenizerSuite.scala
Please register or sign in to comment