-
- Downloads
[SPARK-10005] [SQL] Fixes schema merging for nested structs
In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled. For example, the schema of a Parquet file to be read can be: ``` message individual { required group f1 { optional binary f11 (utf8); } } ``` while the global schema is: ``` message global { required group f1 { optional binary f11 (utf8); optional int32 f12; } } ``` This PR fixes this issue by padding missing fields when creating actual converters. Author: Cheng Lian <lian@databricks.com> Closes #8228 from liancheng/spark-10005/nested-schema-merging.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystReadSupport.scala 15 additions, 4 deletions...l/execution/datasources/parquet/CatalystReadSupport.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala 65 additions, 5 deletions.../execution/datasources/parquet/CatalystRowConverter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystSchemaConverter.scala 3 additions, 12 deletions...ecution/datasources/parquet/CatalystSchemaConverter.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala 29 additions, 1 deletion...sql/execution/datasources/parquet/ParquetQuerySuite.scala
Please register or sign in to comment