-
- Downloads
[SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule
The second PR for SPARK-9241, this adds support for multiple distinct columns to the new aggregation code path. This PR solves the multiple DISTINCT column problem by rewriting these Aggregates into an Expand-Aggregate-Aggregate combination. See the [JIRA ticket](https://issues.apache.org/jira/browse/SPARK-9241) for some information on this. The advantages over the - competing - [first PR](https://github.com/apache/spark/pull/9280) are: - This can use the faster TungstenAggregate code path. - It is impossible to OOM due to an ```OpenHashSet``` allocating to much memory. However, this will multiply the number of input rows by the number of distinct clauses (plus one), and puts a lot more memory pressure on the aggregation code path itself. The location of this Rule is a bit funny, and should probably change when the old aggregation path is changed. cc yhuai - Could you also tell me where to add tests for this? Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #9406 from hvanhovell/SPARK-9241-rewriter.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala 2 additions, 0 deletions...ache/spark/sql/catalyst/expressions/aggregate/Count.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala 182 additions, 4 deletions...ache/spark/sql/catalyst/expressions/aggregate/Utils.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala 6 additions, 0 deletions...spark/sql/catalyst/expressions/aggregate/interfaces.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 3 additions, 3 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala 44 additions, 36 deletions...che/spark/sql/catalyst/plans/logical/basicOperators.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 1 addition, 1 deletion...cala/org/apache/spark/sql/execution/SparkStrategies.scala
Loading
Please register or sign in to comment