-
- Downloads
[SPARK-16391][SQL] Support partial aggregation for reduceGroups
## What changes were proposed in this pull request? This patch introduces a new private ReduceAggregator interface that is a subclass of Aggregator. ReduceAggregator only requires a single associative and commutative reduce function. ReduceAggregator is also used to implement KeyValueGroupedDataset.reduceGroups in order to support partial aggregation. Note that the pull request was initially done by viirya. ## How was this patch tested? Covered by original tests for reduceGroups, as well as a new test suite for ReduceAggregator. Author: Reynold Xin <rxin@databricks.com> Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #14576 from rxin/reduceAggregator.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala 5 additions, 5 deletions...n/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
- sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala 68 additions, 0 deletions...a/org/apache/spark/sql/expressions/ReduceAggregator.scala
- sql/core/src/test/scala/org/apache/spark/sql/expressions/ReduceAggregatorSuite.scala 73 additions, 0 deletions.../apache/spark/sql/expressions/ReduceAggregatorSuite.scala
Please register or sign in to comment