Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    02a4386a
    [SPARK-20978][SQL] Bump up Univocity version to 2.5.4 · 02a4386a
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    There was a bug in Univocity Parser that causes the issue in SPARK-20978. This was fixed as below:
    
    ```scala
    val df = spark.read.schema("a string, b string, unparsed string").option("columnNameOfCorruptRecord", "unparsed").csv(Seq("a").toDS())
    df.show()
    ```
    
    **Before**
    
    ```
    java.lang.NullPointerException
    	at scala.collection.immutable.StringLike$class.stripLineEnd(StringLike.scala:89)
    	at scala.collection.immutable.StringOps.stripLineEnd(StringOps.scala:29)
    	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$getCurrentInput(UnivocityParser.scala:56)
    	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:207)
    	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:207)
    ...
    ```
    
    **After**
    
    ```
    +---+----+--------+
    |  a|   b|unparsed|
    +---+----+--------+
    |  a|null|       a|
    +---+----+--------+
    ```
    
    It was fixed in 2.5.0 and 2.5.4 was released. I guess it'd be safe to upgrade this.
    
    ## How was this patch tested?
    
    Unit test added in `CSVSuite.scala`.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #19113 from HyukjinKwon/bump-up-univocity.
    02a4386a
    History
    [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    There was a bug in Univocity Parser that causes the issue in SPARK-20978. This was fixed as below:
    
    ```scala
    val df = spark.read.schema("a string, b string, unparsed string").option("columnNameOfCorruptRecord", "unparsed").csv(Seq("a").toDS())
    df.show()
    ```
    
    **Before**
    
    ```
    java.lang.NullPointerException
    	at scala.collection.immutable.StringLike$class.stripLineEnd(StringLike.scala:89)
    	at scala.collection.immutable.StringOps.stripLineEnd(StringOps.scala:29)
    	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$getCurrentInput(UnivocityParser.scala:56)
    	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:207)
    	at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert$1.apply(UnivocityParser.scala:207)
    ...
    ```
    
    **After**
    
    ```
    +---+----+--------+
    |  a|   b|unparsed|
    +---+----+--------+
    |  a|null|       a|
    +---+----+--------+
    ```
    
    It was fixed in 2.5.0 and 2.5.4 was released. I guess it'd be safe to upgrade this.
    
    ## How was this patch tested?
    
    Unit test added in `CSVSuite.scala`.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #19113 from HyukjinKwon/bump-up-univocity.