-
- Downloads
[SPARK-2205] [SQL] Avoid unnecessary exchange operators in multi-way joins
This PR adds `PartitioningCollection`, which is used to represent the `outputPartitioning` for SparkPlans with multiple children (e.g. `ShuffledHashJoin`). So, a `SparkPlan` can have multiple descriptions of its partitioning schemes. Taking `ShuffledHashJoin` as an example, it has two descriptions of its partitioning schemes, i.e. `left.outputPartitioning` and `right.outputPartitioning`. So when we have a query like `select * from t1 join t2 on (t1.x = t2.x) join t3 on (t2.x = t3.x)` will only have three Exchange operators (when shuffled joins are needed) instead of four. The code in this PR was authored by yhuai; I'm opening this PR to factor out this change from #7685, a larger pull request which contains two other optimizations. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7773) <!-- Reviewable:end --> Author: Yin Huai <yhuai@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #7773 from JoshRosen/multi-way-join-planning-improvements and squashes the following commits: 5c45924 [Josh Rosen] Merge remote-tracking branch 'origin/master' into multi-way-join-planning-improvements cd8269b [Josh Rosen] Refactor test to use SQLTestUtils 2963857 [Yin Huai] Revert unnecessary SqlConf change. 73913f7 [Yin Huai] Add comments and test. Also, revert the change in ShuffledHashOuterJoin for now. 4a99204 [Josh Rosen] Delete unrelated expression change 884ab95 [Josh Rosen] Carve out only SPARK-2205 changes. 247e5fa [Josh Rosen] Merge remote-tracking branch 'origin/master' into multi-way-join-planning-improvements c57a954 [Yin Huai] Bug fix. d3d2e64 [Yin Huai] First round of cleanup. f9516b0 [Yin Huai] Style c6667e7 [Yin Huai] Add PartitioningCollection. e616d3b [Yin Huai] wip 7c2d2d8 [Yin Huai] Bug fix and refactoring. 69bb072 [Yin Huai] Introduce NullSafeHashPartitioning and NullUnsafePartitioning. d5b84c3 [Yin Huai] Do not add unnessary filters. 2201129 [Yin Huai] Filter out rows that will not be joined in equal joins early.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala 77 additions, 10 deletions...ache/spark/sql/catalyst/plans/physical/partitioning.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/DistributionSuite.scala 1 addition, 1 deletion...ala/org/apache/spark/sql/catalyst/DistributionSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala 1 addition, 1 deletion.../main/scala/org/apache/spark/sql/execution/Exchange.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala 3 additions, 1 deletion...he/spark/sql/execution/joins/BroadcastHashOuterJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala 0 additions, 9 deletions.../org/apache/spark/sql/execution/joins/HashOuterJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinHash.scala 4 additions, 2 deletions...g/apache/spark/sql/execution/joins/LeftSemiJoinHash.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala 4 additions, 3 deletions...g/apache/spark/sql/execution/joins/ShuffledHashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashOuterJoin.scala 9 additions, 1 deletion...che/spark/sql/execution/joins/ShuffledHashOuterJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala 2 additions, 1 deletion.../org/apache/spark/sql/execution/joins/SortMergeJoin.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala 47 additions, 2 deletions...t/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Loading
Please register or sign in to comment