-
- Downloads
[SPARK-17142][SQL] Complex query triggers binding error in HashAggregateExec
## What changes were proposed in this pull request? In `ReorderAssociativeOperator` rule, we extract foldable expressions with Add/Multiply arithmetics, and replace with eval literal. For example, `(a + 1) + (b + 2)` is optimized to `(a + b + 3)` by this rule. For aggregate operator, output expressions should be derived from groupingExpressions, current implemenation of `ReorderAssociativeOperator` rule may break this promise. A instance could be: ``` SELECT ((t1.a + 1) + (t2.a + 2)) AS out_col FROM testdata2 AS t1 INNER JOIN testdata2 AS t2 ON (t1.a = t2.a) GROUP BY (t1.a + 1), (t2.a + 2) ``` `((t1.a + 1) + (t2.a + 2))` is optimized to `(t1.a + t2.a + 3)`, which could not be derived from `ExpressionSet((t1.a +1), (t2.a + 2))`. Maybe we should improve the rule of `ReorderAssociativeOperator` by adding a GroupingExpressionSet to keep Aggregate.groupingExpressions, and respect these expressions during the optimize stage. ## How was this patch tested? Add new test case in `ReorderAssociativeOperatorSuite`. Author: jiangxingbo <jiangxb1987@gmail.com> Closes #14917 from jiangxb1987/rao.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala 24 additions, 7 deletions...org/apache/spark/sql/catalyst/optimizer/expressions.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReorderAssociativeOperatorSuite.scala 15 additions, 1 deletion.../catalyst/optimizer/ReorderAssociativeOperatorSuite.scala
Please register or sign in to comment