-
- Downloads
[SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences
JIRA: https://issues.apache.org/jira/browse/SPARK-12705 **Scope:** This PR is a general fix for sorting reference resolution when the child's `outputSet` does not have the order-by attributes (called, *missing attributes*): - UnaryNode support is limited to `Project`, `Window`, `Aggregate`, `Distinct`, `Filter`, `RepartitionByExpression`. - We will not try to resolve the missing references inside a subquery, unless the outputSet of this subquery contains it. **General Reference Resolution Rules:** - Jump over the nodes with the following types: `Distinct`, `Filter`, `RepartitionByExpression`. Do not need to add missing attributes. The reason is their `outputSet` is decided by their `inputSet`, which is the `outputSet` of their children. - Group-by expressions in `Aggregate`: missing order-by attributes are not allowed to be added into group-by expressions since it will change the query result. Thus, in RDBMS, it is not allowed. - Aggregate expressions in `Aggregate`: if the group-by expressions in `Aggregate` contains the missing attributes but aggregate expressions do not have it, just add them into the aggregate expressions. This can resolve the analysisExceptions thrown by the three TCPDS queries. - `Project` and `Window` are special. We just need to add the missing attributes to their `projectList`. **Implementation:** 1. Traverse the whole tree in a pre-order manner to find all the resolvable missing order-by attributes. 2. Traverse the whole tree in a post-order manner to add the found missing order-by attributes to the node if their `inputSet` contains the attributes. 3. If the origins of the missing order-by attributes are different nodes, each pass only resolves the missing attributes that are from the same node. **Risk:** Low. This rule will be trigger iff ```!s.resolved && child.resolved``` is true. Thus, very few cases are affected. Author: gatorsmile <gatorsmile@gmail.com> Closes #10678 from gatorsmile/sortWindows.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 80 additions, 21 deletions...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala 83 additions, 0 deletions...rg/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TestRelations.scala 6 additions, 0 deletions...rg/apache/spark/sql/catalyst/analysis/TestRelations.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala 16 additions, 0 deletions.../test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 6 additions, 0 deletions.../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 83 additions, 1 deletion...a/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Loading
Please register or sign in to comment