-
- Downloads
[SPARK-12656] [SQL] Implement Intersect with Left-semi Join
Our current Intersect physical operator simply delegates to RDD.intersect. We should remove the Intersect physical operator and simply transform a logical intersect into a semi-join with distinct. This way, we can take advantage of all the benefits of join implementations (e.g. managed memory, code generation, broadcast joins). After a search, I found one of the mainstream RDBMS did the same. In their query explain, Intersect is replaced by Left-semi Join. Left-semi Join could help outer-join elimination in Optimizer, as shown in the PR: https://github.com/apache/spark/pull/10566 Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10630 from gatorsmile/IntersectBySemiJoin.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 62 additions, 51 deletions...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala 11 additions, 3 deletions...rg/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 26 additions, 19 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala 23 additions, 9 deletions...che/spark/sql/catalyst/plans/logical/basicOperators.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala 5 additions, 0 deletions...rg/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/AggregateOptimizeSuite.scala 0 additions, 12 deletions...spark/sql/catalyst/optimizer/AggregateOptimizeSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceOperatorSuite.scala 59 additions, 0 deletions...e/spark/sql/catalyst/optimizer/ReplaceOperatorSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala 1 addition, 14 deletions...ache/spark/sql/catalyst/optimizer/SetOperationSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 3 additions, 2 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala 0 additions, 12 deletions...scala/org/apache/spark/sql/execution/basicOperators.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 21 additions, 0 deletions.../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
Loading
Please register or sign in to comment