Skip to content
Snippets Groups Projects
Commit 0bf605c2 authored by Wenchen Fan's avatar Wenchen Fan Committed by gatorsmile
Browse files

[SPARK-19292][SQL] filter with partition columns should be case-insensitive on Hive tables

## What changes were proposed in this pull request?

When we query a table with a filter on partitioned columns, we will push the partition filter to the metastore to get matched partitions directly.

In `HiveExternalCatalog.listPartitionsByFilter`, we assume the column names in partition filter are already normalized and we don't need to consider case sensitivity. However, `HiveTableScanExec` doesn't follow this assumption. This PR fixes it.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes #16647 from cloud-fan/bug.
parent 148a84b3
No related branches found
No related tags found
No related merge requests found
......@@ -62,7 +62,7 @@ object FileSourceStrategy extends Strategy with Logging {
val filterSet = ExpressionSet(filters)
// The attribute name of predicate could be different than the one in schema in case of
// case insensitive, we should change them to match the one in schema, so we donot need to
// case insensitive, we should change them to match the one in schema, so we do not need to
// worry about case sensitivity anymore.
val normalizedFilters = filters.map { e =>
e transform {
......
......@@ -146,9 +146,19 @@ case class HiveTableScanExec(
hadoopReader.makeRDDForTable(relation.hiveQlTable)
}
} else {
// The attribute name of predicate could be different than the one in schema in case of
// case insensitive, we should change them to match the one in schema, so we do not need to
// worry about case sensitivity anymore.
val normalizedFilters = partitionPruningPred.map { e =>
e transform {
case a: AttributeReference =>
a.withName(relation.output.find(_.semanticEquals(a)).get.name)
}
}
Utils.withDummyCallSite(sqlContext.sparkContext) {
hadoopReader.makeRDDForPartitionedTable(
prunePartitions(relation.getHiveQlPartitions(partitionPruningPred)))
prunePartitions(relation.getHiveQlPartitions(normalizedFilters)))
}
}
val numOutputRows = longMetric("numOutputRows")
......
......@@ -2014,4 +2014,17 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
)
}
}
test("SPARK-19292: filter with partition columns should be case-insensitive on Hive tables") {
withTable("tbl") {
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
sql("CREATE TABLE tbl(i int, j int) USING hive PARTITIONED BY (j)")
sql("INSERT INTO tbl PARTITION(j=10) SELECT 1")
checkAnswer(spark.table("tbl"), Row(1, 10))
checkAnswer(sql("SELECT i, j FROM tbl WHERE J=10"), Row(1, 10))
checkAnswer(spark.table("tbl").filter($"J" === 10), Row(1, 10))
}
}
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment