-
- Downloads
[SPARK-6010] [SQL] Merging compatible Parquet schemas before computing splits
`ReadContext.init` calls `InitContext.getMergedKeyValueMetadata`, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue. In this PR, we manually merge the schemas before passing it to `ReadContext` to avoid the exception. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4768) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4768 from liancheng/spark-6010 and squashes the following commits: 9002f0a [Cheng Lian] Fixes SPARK-6010
Showing
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala 19 additions, 1 deletion...org/apache/spark/sql/parquet/ParquetTableOperations.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTest.scala 5 additions, 0 deletions...main/scala/org/apache/spark/sql/parquet/ParquetTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetPartitionDiscoverySuite.scala 21 additions, 0 deletions...he/spark/sql/parquet/ParquetPartitionDiscoverySuite.scala
Please register or sign in to comment