-
- Downloads
[SPARK-7794] [MLLIB] update RegexTokenizer default settings
The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6330 from mengxr/SPARK-7794 and squashes the following commits: 5ee7cde [Xiangrui Meng] update RegexTokenizer default settings
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala 10 additions, 8 deletions...rc/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala 15 additions, 17 deletions...st/scala/org/apache/spark/ml/feature/TokenizerSuite.scala
- python/pyspark/ml/feature.py 19 additions, 21 deletionspython/pyspark/ml/feature.py
Loading
Please register or sign in to comment