-
- Downloads
[SPARK-13466] [SQL] Remove projects that become redundant after column pruning rule
JIRA: https://issues.apache.org/jira/browse/SPARK-13466 ## What changes were proposed in this pull request? With column pruning rule in optimizer, some Project operators will become redundant. We should remove these redundant Projects. For an example query: val input = LocalRelation('key.int, 'value.string) val query = Project(Seq($"x.key", $"y.key"), Join( SubqueryAlias("x", input), BroadcastHint(SubqueryAlias("y", input)), Inner, None)) After the first run of column pruning, it would like: Project(Seq($"x.key", $"y.key"), Join( Project(Seq($"x.key"), SubqueryAlias("x", input)), Project(Seq($"y.key"), <-- inserted by the rule BroadcastHint(SubqueryAlias("y", input))), Inner, None)) Actually we don't need the outside Project now. This patch will remove it: Join( Project(Seq($"x.key"), SubqueryAlias("x", input)), Project(Seq($"y.key"), BroadcastHint(SubqueryAlias("y", input))), Inner, None) ## How was the this patch tested? Unit test is added into ColumnPruningSuite. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11341 from viirya/remove-redundant-project.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 5 additions, 1 deletion...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala 22 additions, 1 deletion...che/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala 4 additions, 5 deletions.../spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala 4 additions, 4 deletions...g/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
Loading
Please register or sign in to comment