Skip to content
  • Cheng Hao's avatar
    f728e0fe
    [SPARK-2663] [SQL] Support the Grouping Set · f728e0fe
    Cheng Hao authored
    Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`.
    
    More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
    https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf
    
    The generic idea of the implementations are :
    1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS`
    2 Explode each of the input row, and then feed them to `Aggregate`
      * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`)
      * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask)
      * Output Schema of `Explode` is `child.output :+ grouping__id`
      * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id`
      * Keep the `Aggregation expressions` the same for the `Aggregate`
    
    The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections.
    
    A known issue will be done in the follow up PR:
    * Optimization `ColumnPruning` is not supported yet for `Explosive` node.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits:
    
    fe65fcc [Cheng Hao] Remove the extra space
    3547056 [Cheng Hao] Add more doc and Simplify the Expand
    a7c869d [Cheng Hao] update code as feedbacks
    d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression]
    414b165 [Cheng Hao] revert the unnecessary changes
    ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets
    f728e0fe
    [SPARK-2663] [SQL] Support the Grouping Set
    Cheng Hao authored
    Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`.
    
    More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
    https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf
    
    The generic idea of the implementations are :
    1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS`
    2 Explode each of the input row, and then feed them to `Aggregate`
      * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`)
      * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask)
      * Output Schema of `Explode` is `child.output :+ grouping__id`
      * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id`
      * Keep the `Aggregation expressions` the same for the `Aggregate`
    
    The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections.
    
    A known issue will be done in the follow up PR:
    * Optimization `ColumnPruning` is not supported yet for `Explosive` node.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits:
    
    fe65fcc [Cheng Hao] Remove the extra space
    3547056 [Cheng Hao] Add more doc and Simplify the Expand
    a7c869d [Cheng Hao] update code as feedbacks
    d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression]
    414b165 [Cheng Hao] revert the unnecessary changes
    ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets
Loading