-
- Downloads
[SPARK-18939][SQL] Timezone support in partition values.
## What changes were proposed in this pull request? This is a follow-up pr of #16308 and #16750. This pr enables timezone support in partition values. We should use `timeZone` option introduced at #16750 to parse/format partition values of the `TimestampType`. For example, if you have timestamp `"2016-01-01 00:00:00"` in `GMT` which will be used for partition values, the values written by the default timezone option, which is `"GMT"` because the session local timezone is `"GMT"` here, are: ```scala scala> spark.conf.set("spark.sql.session.timeZone", "GMT") scala> val df = Seq((1, new java.sql.Timestamp(1451606400000L))).toDF("i", "ts") df: org.apache.spark.sql.DataFrame = [i: int, ts: timestamp] scala> df.show() +---+-------------------+ | i| ts| +---+-------------------+ | 1|2016-01-01 00:00:00| +---+-------------------+ scala> df.write.partitionBy("ts").save("/path/to/gmtpartition") ``` ```sh $ ls /path/to/gmtpartition/ _SUCCESS ts=2016-01-01 00%3A00%3A00 ``` whereas setting the option to `"PST"`, they are: ```scala scala> df.write.option("timeZone", "PST").partitionBy("ts").save("/path/to/pstpartition") ``` ```sh $ ls /path/to/pstpartition/ _SUCCESS ts=2015-12-31 16%3A00%3A00 ``` We can properly read the partition values if the session local timezone and the timezone of the partition values are the same: ```scala scala> spark.read.load("/path/to/gmtpartition").show() +---+-------------------+ | i| ts| +---+-------------------+ | 1|2016-01-01 00:00:00| +---+-------------------+ ``` And even if the timezones are different, we can properly read the values with setting corrent timezone option: ```scala // wrong result scala> spark.read.load("/path/to/pstpartition").show() +---+-------------------+ | i| ts| +---+-------------------+ | 1|2015-12-31 16:00:00| +---+-------------------+ // correct result scala> spark.read.option("timeZone", "PST").load("/path/to/pstpartition").show() +---+-------------------+ | i| ts| +---+-------------------+ | 1|2016-01-01 00:00:00| +---+-------------------+ ``` ## How was this patch tested? Existing tests and added some tests. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #17053 from ueshin/issues/SPARK-18939.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala 3 additions, 1 deletion...g/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala 2 additions, 1 deletion...g/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala 1 addition, 1 deletion...rg/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala 5 additions, 5 deletions...ala/org/apache/spark/sql/catalyst/catalog/interface.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala 6 additions, 4 deletions...pache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala 2 additions, 1 deletion...he/spark/sql/execution/datasources/CatalogFileIndex.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala 11 additions, 7 deletions...he/spark/sql/execution/datasources/FileFormatWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala 11 additions, 5 deletions...ql/execution/datasources/PartitioningAwareFileIndex.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala 32 additions, 10 deletions...e/spark/sql/execution/datasources/PartitioningUtils.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala 10 additions, 5 deletions...apache/spark/sql/execution/datasources/csv/CSVSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala 48 additions, 14 deletions.../datasources/parquet/ParquetPartitionDiscoverySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala 35 additions, 0 deletions.../org/apache/spark/sql/sources/PartitionedWriteSuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 6 additions, 3 deletions...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala 2 additions, 1 deletion...g/apache/spark/sql/hive/execution/HiveTableScanExec.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala 1 addition, 1 deletion.../org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
Loading
Please register or sign in to comment