Skip to content
Snippets Groups Projects
Commit a6cfa3f3 authored by Eric Liang's avatar Eric Liang Committed by Reynold Xin
Browse files

[SPARK-17673][SQL] Incorrect exchange reuse with RowDataSourceScan

## What changes were proposed in this pull request?

It seems the equality check for reuse of `RowDataSourceScanExec` nodes doesn't respect the output schema. This can cause self-joins or unions over the same underlying data source to return incorrect results if they select different fields.

## How was this patch tested?

New unit test passes after the fix.

Author: Eric Liang <ekl@databricks.com>

Closes #15273 from ericl/spark-17673.
parent 46d1203b
No related branches found
No related tags found
No related merge requests found
......@@ -340,6 +340,8 @@ object DataSourceStrategy extends Strategy with Logging {
// `Filter`s or cannot be handled by `relation`.
val filterCondition = unhandledPredicates.reduceLeftOption(expressions.And)
// These metadata values make scan plans uniquely identifiable for equality checking.
// TODO(SPARK-17701) using strings for equality checking is brittle
val metadata: Map[String, String] = {
val pairs = ArrayBuffer.empty[(String, String)]
......@@ -350,6 +352,8 @@ object DataSourceStrategy extends Strategy with Logging {
}
pairs += ("PushedFilters" -> markedFilters.mkString("[", ", ", "]"))
}
pairs += ("ReadSchema" ->
StructType.fromAttributes(projects.map(_.toAttribute)).catalogString)
pairs.toMap
}
......
......@@ -791,4 +791,12 @@ class JDBCSuite extends SparkFunSuite
val schema = JdbcUtils.schemaString(df, "jdbc:mysql://localhost:3306/temp")
assert(schema.contains("`order` TEXT"))
}
test("SPARK-17673: Exchange reuse respects differences in output schema") {
val df = sql("SELECT * FROM inttypes WHERE a IS NOT NULL")
val df1 = df.groupBy("a").agg("c" -> "min")
val df2 = df.groupBy("a").agg("d" -> "min")
val res = df1.union(df2)
assert(res.distinct().count() == 2) // would be 1 if the exchange was incorrectly reused
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment