Skip to content
Snippets Groups Projects
Commit bca8c072 authored by Cheng Lian's avatar Cheng Lian
Browse files

[SPARK-10434] [SQL] Fixes Parquet schema of arrays that may contain null

To keep full compatibility of Parquet write path with Spark 1.4, we should rename the innermost field name of arrays that may contain null from "array_element" to "array".

Please refer to [SPARK-10434] [1] for more details.

[1]: https://issues.apache.org/jira/browse/SPARK-10434

Author: Cheng Lian <lian@databricks.com>

Closes #8586 from liancheng/spark-10434/fix-parquet-array-type.
parent 7a4f326c
No related branches found
No related tags found
No related merge requests found
...@@ -426,13 +426,14 @@ private[parquet] class CatalystSchemaConverter( ...@@ -426,13 +426,14 @@ private[parquet] class CatalystSchemaConverter(
// ArrayType and MapType (for Spark versions <= 1.4.x) // ArrayType and MapType (for Spark versions <= 1.4.x)
// =================================================== // ===================================================
// Spark 1.4.x and prior versions convert ArrayType with nullable elements into a 3-level // Spark 1.4.x and prior versions convert `ArrayType` with nullable elements into a 3-level
// LIST structure. This behavior mimics parquet-hive (1.6.0rc3). Note that this case is // `LIST` structure. This behavior is somewhat a hybrid of parquet-hive and parquet-avro
// covered by the backwards-compatibility rules implemented in `isElementType()`. // (1.6.0rc3): the 3-level structure is similar to parquet-hive while the 3rd level element
// field name "array" is borrowed from parquet-avro.
case ArrayType(elementType, nullable @ true) if !followParquetFormatSpec => case ArrayType(elementType, nullable @ true) if !followParquetFormatSpec =>
// <list-repetition> group <name> (LIST) { // <list-repetition> group <name> (LIST) {
// optional group bag { // optional group bag {
// repeated <element-type> element; // repeated <element-type> array;
// } // }
// } // }
ConversionPatterns.listType( ConversionPatterns.listType(
...@@ -441,8 +442,8 @@ private[parquet] class CatalystSchemaConverter( ...@@ -441,8 +442,8 @@ private[parquet] class CatalystSchemaConverter(
Types Types
.buildGroup(REPEATED) .buildGroup(REPEATED)
// "array_element" is the name chosen by parquet-hive (1.7.0 and prior version) // "array_element" is the name chosen by parquet-hive (1.7.0 and prior version)
.addField(convertField(StructField("array_element", elementType, nullable))) .addField(convertField(StructField("array", elementType, nullable)))
.named(CatalystConverter.ARRAY_CONTAINS_NULL_BAG_SCHEMA_NAME)) .named("bag"))
// Spark 1.4.x and prior versions convert ArrayType with non-nullable elements into a 2-level // Spark 1.4.x and prior versions convert ArrayType with non-nullable elements into a 2-level
// LIST structure. This behavior mimics parquet-avro (1.6.0rc3). Note that this case is // LIST structure. This behavior mimics parquet-avro (1.6.0rc3). Note that this case is
......
...@@ -197,7 +197,7 @@ class ParquetSchemaInferenceSuite extends ParquetSchemaTest { ...@@ -197,7 +197,7 @@ class ParquetSchemaInferenceSuite extends ParquetSchemaTest {
|message root { |message root {
| optional group _1 (LIST) { | optional group _1 (LIST) {
| repeated group bag { | repeated group bag {
| optional int32 array_element; | optional int32 array;
| } | }
| } | }
|} |}
...@@ -266,7 +266,7 @@ class ParquetSchemaInferenceSuite extends ParquetSchemaTest { ...@@ -266,7 +266,7 @@ class ParquetSchemaInferenceSuite extends ParquetSchemaTest {
| optional binary _1 (UTF8); | optional binary _1 (UTF8);
| optional group _2 (LIST) { | optional group _2 (LIST) {
| repeated group bag { | repeated group bag {
| optional group array_element { | optional group array {
| required int32 _1; | required int32 _1;
| required double _2; | required double _2;
| } | }
...@@ -645,7 +645,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest { ...@@ -645,7 +645,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest {
"""message root { """message root {
| optional group f1 (LIST) { | optional group f1 (LIST) {
| repeated group bag { | repeated group bag {
| optional int32 array_element; | optional int32 array;
| } | }
| } | }
|} |}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment