Skip to content
Snippets Groups Projects
Commit 608bf30f authored by Koert Kuipers's avatar Koert Kuipers Committed by Wenchen Fan
Browse files

[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin...

[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE

Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown

## What changes were proposed in this pull request?
Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.

## How was this patch tested?

Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Koert Kuipers <koert@tresata.com>

Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.
parent 702d85af
No related branches found
No related tags found
No related merge requests found
......@@ -134,8 +134,8 @@ case class EliminateOuterJoin(conf: SQLConf) extends Rule[LogicalPlan] with Pred
val leftConditions = conditions.filter(_.references.subsetOf(join.left.outputSet))
val rightConditions = conditions.filter(_.references.subsetOf(join.right.outputSet))
val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
lazy val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
lazy val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
join.joinType match {
case RightOuter if leftHasNonNullPredicate => Inner
......
......@@ -1722,4 +1722,14 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
"Cannot have map type columns in DataFrame which calls set operations"))
}
}
test("SPARK-20359: catalyst outer join optimization should not throw npe") {
val df1 = Seq("a", "b", "c").toDF("x")
.withColumn("y", udf{ (x: String) => x.substring(0, 1) + "!" }.apply($"x"))
val df2 = Seq("a", "b").toDF("x1")
df1
.join(df2, df1("x") === df2("x1"), "left_outer")
.filter($"x1".isNotNull || !$"y".isin("a!"))
.count
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment