Skip to content
  • hyukjinkwon's avatar
    725b860e
    [SPARK-14103][SQL] Parse unescaped quotes in CSV data source. · 725b860e
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below:
    
    ```
    "a"b,ccc,ddd
    e,f,g
    ```
    
    produces a data below:
    
    - **Before**
    
    ```bash
    ["a"b,ccc,ddd[\n]e,f,g]  <- as a value.
    ```
    
    - **After**
    
    ```bash
    ["a"b], [ccc], [ddd]
    [e], [f], [g]
    ```
    
    This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60.
    
    ## How was this patch tested?
    
    Unit tests in `CSVSuite` and `sbt/sbt scalastyle`.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #12226 from HyukjinKwon/SPARK-14103-quote.
    725b860e
    [SPARK-14103][SQL] Parse unescaped quotes in CSV data source.
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below:
    
    ```
    "a"b,ccc,ddd
    e,f,g
    ```
    
    produces a data below:
    
    - **Before**
    
    ```bash
    ["a"b,ccc,ddd[\n]e,f,g]  <- as a value.
    ```
    
    - **After**
    
    ```bash
    ["a"b], [ccc], [ddd]
    [e], [f], [g]
    ```
    
    This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60.
    
    ## How was this patch tested?
    
    Unit tests in `CSVSuite` and `sbt/sbt scalastyle`.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #12226 from HyukjinKwon/SPARK-14103-quote.
Loading