[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single distinct aggregation.

The impact of this change is for a query that has a single distinct column and does not have any grouping expression like `SELECT COUNT(DISTINCT a) FROM table` The plan will be changed from ``` AGG-2 (count distinct) Shuffle to a single reducer Partial-AGG-2 (count distinct) AGG-1 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on 1) ``` to the following one (1.5 uses this) ``` AGG-2 AGG-1 (grouping on a) Shuffle to a single reducer Partial-AGG-1(grouping on a) ``` The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of `spark.sql.specializeSingleDistinctAggPlanning` to control the plan. Author: Yin Huai <yhuai@databricks.com> Closes #9828 from yhuai/distinctRewriter.

[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single distinct aggregation.
96287884 · Yin Huai · f4499920 · 96287884
Commit 96287884 authored 9 years ago by Yin Huai
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
@@ -126,8 +126,8 @@ case class DistinctAggregationRewriter(conf: CatalystConf) extends Rule[LogicalP
    val shouldRewrite = if (conf.specializeSingleDistinctAggPlanning) {
      // When the flag is set to specialize single distinct agg planning,
      // we will rely on our Aggregation strategy to handle queries with a single
-      // distinct column and this aggregate operator does have grouping expressions.
-      distinctAggGroups.size > 1 || (distinctAggGroups.size == 1 && a.groupingExpressions.isEmpty)
+      // distinct column.
+      distinctAggGroups.size > 1
    } else {
      distinctAggGroups.size >= 1
    }