-
- Downloads
[SPARK-19017][SQL] NOT IN subquery with more than one column may return incorrect results
## What changes were proposed in this pull request? This PR fixes the code in Optimizer phase where the NULL-aware expression of a NOT IN query is expanded in Rule `RewritePredicateSubquery`. Example: The query select a1,b1 from t1 where (a1,b1) not in (select a2,b2 from t2); has the (a1, b1) = (a2, b2) rewritten from (before this fix): Join LeftAnti, ((isnull((_1#2 = a2#16)) || isnull((_2#3 = b2#17))) || ((_1#2 = a2#16) && (_2#3 = b2#17))) to (after this fix): Join LeftAnti, (((_1#2 = a2#16) || isnull((_1#2 = a2#16))) && ((_2#3 = b2#17) || isnull((_2#3 = b2#17)))) ## How was this patch tested? sql/test, catalyst/test and new test cases in SQLQueryTestSuite. Author: Nattavut Sutyanyong <nsy.can@gmail.com> Closes #16467 from nsyca/19017. (cherry picked from commit cdb691eb) Signed-off-by:Herman van Hovell <hvanhovell@databricks.com>
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala 8 additions, 2 deletions...la/org/apache/spark/sql/catalyst/optimizer/subquery.scala
- sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/not-in-multiple-columns.sql 55 additions, 0 deletions...s/inputs/subquery/in-subquery/not-in-multiple-columns.sql
- sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out 59 additions, 0 deletions...ults/subquery/in-subquery/not-in-multiple-columns.sql.out
- sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala 6 additions, 1 deletion...c/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 3 additions, 3 deletions...e/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
Loading
Please register or sign in to comment