-
- Downloads
[SPARK-2663] [SQL] Support the Grouping Set
Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`. More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf The generic idea of the implementations are : 1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS` 2 Explode each of the input row, and then feed them to `Aggregate` * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`) * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask) * Output Schema of `Explode` is `child.output :+ grouping__id` * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id` * Keep the `Aggregation expressions` the same for the `Aggregate` The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections. A known issue will be done in the follow up PR: * Optimization `ColumnPruning` is not supported yet for `Explosive` node. Author: Cheng Hao <hao.cheng@intel.com> Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits: fe65fcc [Cheng Hao] Remove the extra space 3547056 [Cheng Hao] Add more doc and Simplify the Expand a7c869d [Cheng Hao] update code as feedbacks d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression] 414b165 [Cheng Hao] revert the unnecessary changes ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 95 additions, 0 deletions...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala 12 additions, 1 deletion...rg/apache/spark/sql/catalyst/expressions/Expression.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala 5 additions, 0 deletions...che/spark/sql/catalyst/expressions/namedExpressions.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala 83 additions, 0 deletions...che/spark/sql/catalyst/plans/logical/basicOperators.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala 79 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/execution/Expand.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 2 additions, 0 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala 7 additions, 0 deletions...che/spark/sql/hive/execution/HiveCompatibilitySuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala 69 additions, 10 deletions...ive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
- sql/hive/src/test/resources/golden/groupby_grouping_id1-0-caf1c5fd299fdbdb655234d01d44caf2 0 additions, 0 deletions...n/groupby_grouping_id1-0-caf1c5fd299fdbdb655234d01d44caf2
- sql/hive/src/test/resources/golden/groupby_grouping_id1-1-b76bf9f6c92f83c9a5f351f8460d1e3b 0 additions, 0 deletions...n/groupby_grouping_id1-1-b76bf9f6c92f83c9a5f351f8460d1e3b
- sql/hive/src/test/resources/golden/groupby_grouping_id1-1-c0f14def6a135cc50cba364e810ce28e 0 additions, 0 deletions...n/groupby_grouping_id1-1-c0f14def6a135cc50cba364e810ce28e
- sql/hive/src/test/resources/golden/groupby_grouping_id1-2-3dfbff77a9b56d44312814d3fb0d07fa 18 additions, 0 deletions...n/groupby_grouping_id1-2-3dfbff77a9b56d44312814d3fb0d07fa
- sql/hive/src/test/resources/golden/groupby_grouping_id1-3-d113d984a30ad7b98c50a46158605a51 12 additions, 0 deletions...n/groupby_grouping_id1-3-d113d984a30ad7b98c50a46158605a51
- sql/hive/src/test/resources/golden/groupby_grouping_id1-4-903a499840102e2cb722dd8b83820391 18 additions, 0 deletions...n/groupby_grouping_id1-4-903a499840102e2cb722dd8b83820391
- sql/hive/src/test/resources/golden/groupby_grouping_id2-0-20539c642c514a590caca2f11395007e 0 additions, 0 deletions...n/groupby_grouping_id2-0-20539c642c514a590caca2f11395007e
- sql/hive/src/test/resources/golden/groupby_grouping_id2-1-d1f3215e349f056c8de60b87a6a9855e 0 additions, 0 deletions...n/groupby_grouping_id2-1-d1f3215e349f056c8de60b87a6a9855e
- sql/hive/src/test/resources/golden/groupby_grouping_id2-1-d6fb6c164cda6a13a71290dbf95fcc6e 0 additions, 0 deletions...n/groupby_grouping_id2-1-d6fb6c164cda6a13a71290dbf95fcc6e
- sql/hive/src/test/resources/golden/groupby_grouping_id2-2-a00d1791b7fa7ac5a0505d95c3d12257 1 addition, 0 deletions...n/groupby_grouping_id2-2-a00d1791b7fa7ac5a0505d95c3d12257
- sql/hive/src/test/resources/golden/groupby_grouping_id2-3-b3cb95405b1200603f40aaab24be7586 11 additions, 0 deletions...n/groupby_grouping_id2-3-b3cb95405b1200603f40aaab24be7586
- sql/hive/src/test/resources/golden/groupby_grouping_id2-4-9bdcf67baa5d7cf70cd2eb1a3ec6de74 3 additions, 0 deletions...n/groupby_grouping_id2-4-9bdcf67baa5d7cf70cd2eb1a3ec6de74
Please register or sign in to comment