Skip to content
Snippets Groups Projects
  • Herman van Hovell's avatar
    05af2de0
    [SPARK-21830][SQL] Bump ANTLR version and fix a few issues. · 05af2de0
    Herman van Hovell authored
    ## What changes were proposed in this pull request?
    This PR bumps the ANTLR version to 4.7, and fixes a number of small parser related issues uncovered by the bump.
    
    The main reason for upgrading is that in some cases the current version of ANTLR (4.5) can exhibit exponential slowdowns if it needs to parse boolean predicates. For example the following query will take forever to parse:
    ```sql
    SELECT *
    FROM RANGE(1000)
    WHERE
    TRUE
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    ```
    
    This is caused by a know bug in ANTLR (https://github.com/antlr/antlr4/issues/994), which was fixed in version 4.6.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Herman van Hovell <hvanhovell@databricks.com>
    
    Closes #19042 from hvanhovell/SPARK-21830.
    05af2de0
    History
    [SPARK-21830][SQL] Bump ANTLR version and fix a few issues.
    Herman van Hovell authored
    ## What changes were proposed in this pull request?
    This PR bumps the ANTLR version to 4.7, and fixes a number of small parser related issues uncovered by the bump.
    
    The main reason for upgrading is that in some cases the current version of ANTLR (4.5) can exhibit exponential slowdowns if it needs to parse boolean predicates. For example the following query will take forever to parse:
    ```sql
    SELECT *
    FROM RANGE(1000)
    WHERE
    TRUE
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    AND NOT upper(DESCRIPTION) LIKE '%FOO%'
    ```
    
    This is caused by a know bug in ANTLR (https://github.com/antlr/antlr4/issues/994), which was fixed in version 4.6.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Herman van Hovell <hvanhovell@databricks.com>
    
    Closes #19042 from hvanhovell/SPARK-21830.