-
- Downloads
[SPARK-14781] [SQL] support nested predicate subquery
## What changes were proposed in this pull request? In order to support nested predicate subquery, this PR introduce an internal join type ExistenceJoin, which will emit all the rows from left, plus an additional column, which presents there are any rows matched from right or not (it's not null-aware right now). This additional column could be used to replace the subquery in Filter. In theory, all the predicate subquery could use this join type, but it's slower than LeftSemi and LeftAnti, so it's only used for nested subquery (subquery inside OR). For example, the following SQL: ```sql SELECT a FROM t WHERE EXISTS (select 0) OR EXISTS (select 1) ``` This PR also fix a bug in predicate subquery push down through join (they should not). Nested null-aware subquery is still not supported. For example, `a > 3 OR b NOT IN (select bb from t)` After this, we could run TPCDS query Q10, Q35, Q45 ## How was this patch tested? Added unit tests. Author: Davies Liu <davies@databricks.com> Closes #12820 from davies/or_exists.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala 3 additions, 2 deletions...rg/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala 14 additions, 1 deletion.../org/apache/spark/sql/catalyst/expressions/subquery.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 34 additions, 7 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala 10 additions, 0 deletions...scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala 4 additions, 0 deletions...rk/sql/catalyst/plans/logical/basicLogicalOperators.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala 7 additions, 4 deletions...ache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 1 addition, 0 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala 64 additions, 2 deletions...che/spark/sql/execution/joins/BroadcastHashJoinExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala 69 additions, 25 deletions...ark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala 27 additions, 4 deletions...scala/org/apache/spark/sql/execution/joins/HashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala 1 addition, 12 deletions...ache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala 40 additions, 0 deletions.../apache/spark/sql/execution/joins/SortMergeJoinExec.scala
- sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 25 additions, 0 deletions...e/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/joins/ExistenceJoinSuite.scala 46 additions, 4 deletions...apache/spark/sql/execution/joins/ExistenceJoinSuite.scala
Loading
Please register or sign in to comment