-
Xiangrui Meng authored
The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6330 from mengxr/SPARK-7794 and squashes the following commits: 5ee7cde [Xiangrui Meng] update RegexTokenizer default settings
Xiangrui Meng authoredThe previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6330 from mengxr/SPARK-7794 and squashes the following commits: 5ee7cde [Xiangrui Meng] update RegexTokenizer default settings