-
- Downloads
[SPARK-12213][SQL] use multiple partitions for single distinct query
Currently, we could generate different plans for query with single distinct (depends on spark.sql.specializeSingleDistinctAggPlanning), one works better on low cardinality columns, the other works better for high cardinality column (default one). This PR change to generate a single plan (three aggregations and two exchanges), which work better in both cases, then we could safely remove the flag `spark.sql.specializeSingleDistinctAggPlanning` (introduced in 1.6). For a query like `SELECT COUNT(DISTINCT a) FROM table` will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping) Partial-AGG-2 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on a) ``` This PR also includes large refactor for aggregation (reduce 500+ lines of code) cc yhuai nongli marmbrus Author: Davies Liu <davies@databricks.com> Closes #10228 from davies/single_distinct.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala 0 additions, 7 deletions...in/scala/org/apache/spark/sql/catalyst/CatalystConf.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala 2 additions, 9 deletions...k/sql/catalyst/analysis/DistinctAggregationRewriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala 0 additions, 15 deletionssql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala 132 additions, 285 deletions...e/spark/sql/execution/aggregate/AggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregate.scala 9 additions, 20 deletions...he/spark/sql/execution/aggregate/SortBasedAggregate.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala 22 additions, 25 deletions...ql/execution/aggregate/SortBasedAggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala 8 additions, 17 deletions...che/spark/sql/execution/aggregate/TungstenAggregate.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala 55 additions, 384 deletions...sql/execution/aggregate/TungstenAggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala 127 additions, 153 deletions...cala/org/apache/spark/sql/execution/aggregate/utils.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala 67 additions, 75 deletions...ache/spark/sql/hive/execution/AggregationQuerySuite.scala
Loading
Please register or sign in to comment