-
- Downloads
[SPARK-13840][SQL] Split Optimizer Rule ColumnPruning to ColumnPruning and EliminateOperator
#### What changes were proposed in this pull request? Before this PR, two Optimizer rules `ColumnPruning` and `PushPredicateThroughProject` reverse each other's effects. Optimizer always reaches the max iteration when optimizing some queries. Extra `Project` are found in the plan. For example, below is the optimized plan after reaching 100 iterations: ``` Join Inner, Some((cast(id1#16 as bigint) = id1#18L)) :- Project [id1#16] : +- Filter isnotnull(cast(id1#16 as bigint)) : +- Project [id1#16] : +- Relation[id1#16,newCol#17] JSON part: struct<>, data: struct<id1:int,newCol:int> +- Filter isnotnull(id1#18L) +- Relation[id1#18L] JSON part: struct<>, data: struct<id1:bigint> ``` This PR splits the optimizer rule `ColumnPruning` to `ColumnPruning` and `EliminateOperators` The issue becomes worse when having another rule `NullFiltering`, which could add extra Filters for `IsNotNull`. We have to be careful when introducing extra `Filter` if the benefit is not large enough. Another PR will be submitted by sameeragarwal to handle this issue. cc sameeragarwal marmbrus In addition, `ColumnPruning` should not push `Project` through non-deterministic `Filter`. This could cause wrong results. This will be put in a separate PR. cc davies cloud-fan yhuai #### How was this patch tested? Modified the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #11682 from gatorsmile/viewDuplicateNames.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 20 additions, 11 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala 3 additions, 2 deletions...che/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CombiningLimitsSuite.scala 2 additions, 1 deletion...e/spark/sql/catalyst/optimizer/CombiningLimitsSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala 1 addition, 0 deletions.../spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
Loading
Please register or sign in to comment