Skip to content
Snippets Groups Projects
Commit bc303514 authored by Budde's avatar Budde Committed by Wenchen Fan
Browse files

[SPARK-19611][SQL] Preserve metastore field order when merging inferred schema

## What changes were proposed in this pull request?

The ```HiveMetastoreCatalog.mergeWithMetastoreSchema()``` method added in #16944 may
not preserve the same field order as the metastore schema in some cases, which can cause
queries to fail. This change ensures that the metastore field order is preserved.

## How was this patch tested?

A test for ensuring that metastore order is preserved was added to ```HiveSchemaInferenceSuite.```
The particular failure usecase from #16944 was tested manually as well.

Author: Budde <budde@amazon.com>

Closes #17249 from budde/PreserveMetastoreFieldOrder.
parent 8f0490e2
No related branches found
No related tags found
No related merge requests found
......@@ -356,13 +356,10 @@ private[hive] object HiveMetastoreCatalog {
.filterKeys(!inferredSchema.map(_.name.toLowerCase).contains(_))
.values
.filter(_.nullable)
// Merge missing nullable fields to inferred schema and build a case-insensitive field map.
val inferredFields = StructType(inferredSchema ++ missingNullables)
.map(f => f.name.toLowerCase -> f).toMap
StructType(metastoreFields.map { case(name, field) =>
field.copy(name = inferredFields(name).name)
}.toSeq)
StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
} catch {
case NonFatal(_) =>
val msg = s"""Detected conflicting schemas when merging the schema obtained from the Hive
......
......@@ -293,6 +293,27 @@ class HiveSchemaInferenceSuite
StructField("firstField", StringType, nullable = true),
StructField("secondField", StringType, nullable = true))))
}.getMessage.contains("Detected conflicting schemas"))
// Schema merge should maintain metastore order.
assertResult(
StructType(Seq(
StructField("first_field", StringType, nullable = true),
StructField("second_field", StringType, nullable = true),
StructField("third_field", StringType, nullable = true),
StructField("fourth_field", StringType, nullable = true),
StructField("fifth_field", StringType, nullable = true)))) {
HiveMetastoreCatalog.mergeWithMetastoreSchema(
StructType(Seq(
StructField("first_field", StringType, nullable = true),
StructField("second_field", StringType, nullable = true),
StructField("third_field", StringType, nullable = true),
StructField("fourth_field", StringType, nullable = true),
StructField("fifth_field", StringType, nullable = true))),
StructType(Seq(
StructField("fifth_field", StringType, nullable = true),
StructField("third_field", StringType, nullable = true),
StructField("second_field", StringType, nullable = true))))
}
}
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment