-
- Downloads
[SPARK-16216][SQL] Read/write timestamps and dates in ISO 8601 and...
[SPARK-16216][SQL] Read/write timestamps and dates in ISO 8601 and dateFormat/timestampFormat option for CSV and JSON ## What changes were proposed in this pull request? ### Default - ISO 8601 Currently, CSV datasource is writing `Timestamp` and `Date` as numeric form and JSON datasource is writing both as below: - CSV ``` // TimestampType 1414459800000000 // DateType 16673 ``` - Json ``` // TimestampType 1970-01-01 11:46:40.0 // DateType 1970-01-01 ``` So, for CSV we can't read back what we write and for JSON it becomes ambiguous because the timezone is being missed. So, this PR make both **write** `Timestamp` and `Date` in ISO 8601 formatted string (please refer the [ISO 8601 specification](https://www.w3.org/TR/NOTE-datetime)). - For `Timestamp` it becomes as below: (`yyyy-MM-dd'T'HH:mm:ss.SSSZZ`) ``` 1970-01-01T02:00:01.000-01:00 ``` - For `Date` it becomes as below (`yyyy-MM-dd`) ``` 1970-01-01 ``` ### Custom date format option - `dateFormat` This PR also adds the support to write and read dates and timestamps in a formatted string as below: - **DateType** - With `dateFormat` option (e.g. `yyyy/MM/dd`) ``` +----------+ | date| +----------+ |2015/08/26| |2014/10/27| |2016/01/28| +----------+ ``` ### Custom date format option - `timestampFormat` - **TimestampType** - With `dateFormat` option (e.g. `dd/MM/yyyy HH:mm`) ``` +----------------+ | date| +----------------+ |2015/08/26 18:00| |2014/10/27 18:30| |2016/01/28 20:00| +----------------+ ``` ## How was this patch tested? Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing tests cover the default cases. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14279 from HyukjinKwon/SPARK-16216-json-csv.
Showing
- python/pyspark/sql/readwriter.py 44 additions, 12 deletionspython/pyspark/sql/readwriter.py
- python/pyspark/sql/streaming.py 22 additions, 8 deletionspython/pyspark/sql/streaming.py
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 14 additions, 4 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 12 additions, 0 deletions...src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala 21 additions, 21 deletions.../spark/sql/execution/datasources/csv/CSVInferSchema.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala 9 additions, 6 deletions...ache/spark/sql/execution/datasources/csv/CSVOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala 37 additions, 6 deletions...che/spark/sql/execution/datasources/csv/CSVRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala 9 additions, 0 deletions...he/spark/sql/execution/datasources/json/JSONOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala 10 additions, 3 deletions...ark/sql/execution/datasources/json/JacksonGenerator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala 19 additions, 8 deletions.../spark/sql/execution/datasources/json/JacksonParser.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala 3 additions, 2 deletions...spark/sql/execution/datasources/json/JsonFileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 14 additions, 5 deletions...ala/org/apache/spark/sql/streaming/DataStreamReader.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala 2 additions, 2 deletions...k/sql/execution/datasources/csv/CSVInferSchemaSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala 155 additions, 2 deletions...apache/spark/sql/execution/datasources/csv/CSVSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala 11 additions, 6 deletions...park/sql/execution/datasources/csv/CSVTypeCastSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala 62 additions, 5 deletions...ache/spark/sql/execution/datasources/json/JsonSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala 6 additions, 0 deletions...e/spark/sql/execution/datasources/json/TestJsonData.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala 4 additions, 0 deletions.../apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala
Loading
Please register or sign in to comment