From 962878843b611fa6229e3ee67bb22e2a4bc283cd Mon Sep 17 00:00:00 2001
From: Yin Huai <yhuai@databricks.com>
Date: Thu, 19 Nov 2015 11:02:17 -0800
Subject: [PATCH] [SPARK-11840][SQL] Restore the 1.5's behavior of planning a
 single distinct aggregation.

The impact of this change is for a query that has a single distinct column and does not have any grouping expression like
`SELECT COUNT(DISTINCT a) FROM table`
The plan will be changed from
```
AGG-2 (count distinct)
  Shuffle to a single reducer
    Partial-AGG-2 (count distinct)
      AGG-1 (grouping on a)
        Shuffle by a
          Partial-AGG-1 (grouping on 1)
```
to the following one (1.5 uses this)
```
AGG-2
  AGG-1 (grouping on a)
    Shuffle to a single reducer
      Partial-AGG-1(grouping on a)
```
The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of `spark.sql.specializeSingleDistinctAggPlanning` to control the plan.

Author: Yin Huai <yhuai@databricks.com>

Closes #9828 from yhuai/distinctRewriter.
---
 .../sql/catalyst/analysis/DistinctAggregationRewriter.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
index c0c960471a..9c78f6d4cc 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
@@ -126,8 +126,8 @@ case class DistinctAggregationRewriter(conf: CatalystConf) extends Rule[LogicalP
     val shouldRewrite = if (conf.specializeSingleDistinctAggPlanning) {
       // When the flag is set to specialize single distinct agg planning,
       // we will rely on our Aggregation strategy to handle queries with a single
-      // distinct column and this aggregate operator does have grouping expressions.
-      distinctAggGroups.size > 1 || (distinctAggGroups.size == 1 && a.groupingExpressions.isEmpty)
+      // distinct column.
+      distinctAggGroups.size > 1
     } else {
       distinctAggGroups.size >= 1
     }
-- 
GitLab