-
- Downloads
[SPARK-17080][SQL] join reorder
## What changes were proposed in this pull request? Reorder the joins using a dynamic programming algorithm (Selinger paper): First we put all items (basic joined nodes) into level 1, then we build all two-way joins at level 2 from plans at level 1 (single items), then build all 3-way joins from plans at previous levels (two-way joins and single items), then 4-way joins ... etc, until we build all n-way joins and pick the best plan among them. When building m-way joins, we only keep the best plan (with the lowest cost) for the same set of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among plans (A J B) J C, (A J C) J B and (B J C) J A. Thus, the plans maintained for each level when reordering four items A, B, C, D are as follows: ``` level 1: p({A}), p({B}), p({C}), p({D}) level 2: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D}) level 3: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D}) level 4: p({A, B, C, D}) ``` where p({A, B, C, D}) is the final output plan. For cost evaluation, since physical costs for operators are not available currently, we use cardinalities and sizes to compute costs. ## How was this patch tested? add test cases Author: wangzhenhua <wangzhenhua@huawei.com> Author: Zhenhua Wang <wzh_zju@163.com> Closes #17138 from wzhfy/joinReorder.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala 8 additions, 0 deletions...in/scala/org/apache/spark/sql/catalyst/CatalystConf.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala 297 additions, 0 deletions...e/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 2 additions, 0 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala 194 additions, 0 deletions...pache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala 1 addition, 1 deletion.../scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationTestBase.scala 2 additions, 2 deletions...ql/catalyst/statsEstimation/StatsEstimationTestBase.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 16 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala 1 addition, 1 deletion.../org/apache/spark/sql/execution/SparkSqlParserSuite.scala
Loading
Please register or sign in to comment