Skip to content
Snippets Groups Projects
  • Xiangrui Meng's avatar
    f5db4b41
    [SPARK-7794] [MLLIB] update RegexTokenizer default settings · f5db4b41
    Xiangrui Meng authored
    The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #6330 from mengxr/SPARK-7794 and squashes the following commits:
    
    5ee7cde [Xiangrui Meng] update RegexTokenizer default settings
    f5db4b41
    History
    [SPARK-7794] [MLLIB] update RegexTokenizer default settings
    Xiangrui Meng authored
    The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #6330 from mengxr/SPARK-7794 and squashes the following commits:
    
    5ee7cde [Xiangrui Meng] update RegexTokenizer default settings