-
- Downloads
[SPARK-12297][SQL] Hive compatibility for Parquet Timestamps
## What changes were proposed in this pull request? This change allows timestamps in parquet-based hive table to behave as a "floating time", without a timezone, as timestamps are for other file formats. If the storage timezone is the same as the session timezone, this conversion is a no-op. When data is read from a hive table, the table property is *always* respected. This allows spark to not change behavior when reading old data, but read newly written data correctly (whatever the source of the data is). Spark inherited the original behavior from Hive, but Hive is also updating behavior to use the same scheme in HIVE-12767 / HIVE-16231. The default for Spark remains unchanged; created tables do not include the new table property. This will only apply to hive tables; nothing is added to parquet metadata to indicate the timezone, so data that is read or written directly from parquet files will never have any conversions applied. ## How was this patch tested? Added a unit test which creates tables, reads and writes data, under a variety of permutations (different storage timezones, different session timezones, vectorized reading on and off). Author: Imran Rashid <irashid@cloudera.com> Closes #16781 from squito/SPARK-12297.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala 2 additions, 2 deletions...ala/org/apache/spark/sql/catalyst/catalog/interface.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala 5 additions, 0 deletions...la/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java 25 additions, 3 deletions...execution/datasources/parquet/VectorizedColumnReader.java
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java 5 additions, 1 deletion...on/datasources/parquet/VectorizedParquetRecordReader.java
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 6 additions, 2 deletions...scala/org/apache/spark/sql/execution/command/tables.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala 2 additions, 0 deletions...sql/execution/datasources/parquet/ParquetFileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala 2 additions, 1 deletion...ql/execution/datasources/parquet/ParquetReadSupport.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRecordMaterializer.scala 7 additions, 2 deletions...ution/datasources/parquet/ParquetRecordMaterializer.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala 41 additions, 12 deletions...l/execution/datasources/parquet/ParquetRowConverter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala 24 additions, 1 deletion...l/execution/datasources/parquet/ParquetWriteSupport.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 10 additions, 1 deletion...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 10 additions, 2 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala 377 additions, 2 deletions...apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
Loading
Please register or sign in to comment