-
- Downloads
[SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly
## Problem CSV in Spark 2.0.0: - does not read null values back correctly for certain data types such as `Boolean`, `TimestampType`, `DateType` -- this is a regression comparing to 1.6; - does not read empty values (specified by `options.nullValue`) as `null`s for `StringType` -- this is compatible with 1.6 but leads to problems like SPARK-16903. ## What changes were proposed in this pull request? This patch makes changes to read all empty values back as `null`s. ## How was this patch tested? New test cases. Author: Liwei Lin <lwlin7@gmail.com> Closes #14118 from lw-lin/csv-cast-null.
Showing
- python/pyspark/sql/readwriter.py 2 additions, 1 deletionpython/pyspark/sql/readwriter.py
- python/pyspark/sql/streaming.py 2 additions, 1 deletionpython/pyspark/sql/streaming.py
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 2 additions, 1 deletion...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala 50 additions, 58 deletions.../spark/sql/execution/datasources/csv/CSVInferSchema.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 2 additions, 1 deletion...ala/org/apache/spark/sql/streaming/DataStreamReader.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala 1 addition, 1 deletion...apache/spark/sql/execution/datasources/csv/CSVSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala 34 additions, 20 deletions...park/sql/execution/datasources/csv/CSVTypeCastSuite.scala
Loading
Please register or sign in to comment