-
- Downloads
[SPARK-17616][SQL] Support a single distinct aggregate combined with a non-partial aggregate
## What changes were proposed in this pull request? We currently cannot execute an aggregate that contains a single distinct aggregate function and an one or more non-partially plannable aggregate functions, for example: ```sql select grp, collect_list(col1), count(distinct col2) from tbl_a group by 1 ``` This is a regression from Spark 1.6. This is caused by the fact that the single distinct aggregation code path assumes that all aggregates can be planned in two phases (is partially aggregatable). This PR works around this issue by triggering the `RewriteDistinctAggregates` in such cases (this is similar to the approach taken in 1.6). ## How was this patch tested? Created `RewriteDistinctAggregatesSuite` which checks if the aggregates with distinct aggregate functions get rewritten into two `Aggregates` and an `Expand`. Added a regression test to `DataFrameAggregateSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #15187 from hvanhovell/SPARK-17616.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala 9 additions, 9 deletions...rk/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala 94 additions, 0 deletions...l/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 8 additions, 0 deletions.../scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
Loading
Please register or sign in to comment