-
- Downloads
[SPARK-17354] [SQL] Partitioning by dates/timestamps should work with Parquet vectorized reader
## What changes were proposed in this pull request? This PR fixes `ColumnVectorUtils.populate` so that Parquet vectorized reader can read partitioned table with dates/timestamps. This works fine with Parquet normal reader. This is being only called within [VectorizedParquetRecordReader.java#L185](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java#L185). When partition column types are explicitly given to `DateType` or `TimestampType` (rather than inferring the type of partition column), this fails with the exception below: ``` 16/09/01 10:30:07 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 6) java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date at org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:89) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:185) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:204) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:362) ... ``` ## How was this patch tested? Unit tests in `SQLQuerySuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14919 from HyukjinKwon/SPARK-17354.
Showing
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java 3 additions, 2 deletions...che/spark/sql/execution/vectorized/ColumnVectorUtils.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java 6 additions, 0 deletions.../apache/spark/sql/execution/vectorized/ColumnarBatch.java
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala 48 additions, 1 deletion...rk/sql/execution/datasources/parquet/ParquetIOSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 21 additions, 0 deletions...a/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Loading
Please register or sign in to comment