-
- Downloads
[SPARK-9240] [SQL] Hybrid aggregate operator using unsafe row
This PR adds a base aggregation iterator `AggregationIterator`, which is used to create `SortBasedAggregationIterator` (for sort-based aggregation) and `UnsafeHybridAggregationIterator` (first it tries hash-based aggregation and falls back to the sort-based aggregation (using external sorter) if we cannot allocate memory for the map). With these two iterators, we will not need existing iterators and I am removing those. Also, we can use a single physical `Aggregate` operator and it internally determines what iterators to used. https://issues.apache.org/jira/browse/SPARK-9240 Author: Yin Huai <yhuai@databricks.com> Closes #7813 from yhuai/AggregateOperator and squashes the following commits: e317e2b [Yin Huai] Remove unnecessary change. 74d93c5 [Yin Huai] Merge remote-tracking branch 'upstream/master' into AggregateOperator ba6afbc [Yin Huai] Add a little bit more comments. c9cf3b6 [Yin Huai] update 0f1b06f [Yin Huai] Remove unnecessary code. 21fd15f [Yin Huai] Remove unnecessary change. 964f88b [Yin Huai] Implement fallback strategy. b1ea5cf [Yin Huai] wip 7fcbd87 [Yin Huai] Add a flag to control what iterator to use. 533d5b2 [Yin Huai] Prepare for fallback! 33b7022 [Yin Huai] wip bd9282b [Yin Huai] UDAFs now supports UnsafeRow. f52ee53 [Yin Huai] wip 3171f44 [Yin Huai] wip d2c45a0 [Yin Huai] wip f60cc83 [Yin Huai] Also check input schema. af32210 [Yin Huai] Check iter.hasNext before we create an iterator because the constructor of the iterato will read at least one row from a non-empty input iter. 299008c [Yin Huai] First round cleanup. 3915bac [Yin Huai] Create a base iterator class for aggregation iterators and add the initial version of the hybrid iterator.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala 12 additions, 7 deletions...spark/sql/catalyst/expressions/aggregate/interfaces.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/Aggregate.scala 182 additions, 0 deletions.../org/apache/spark/sql/execution/aggregate/Aggregate.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala 490 additions, 0 deletions...e/spark/sql/execution/aggregate/AggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala 236 additions, 0 deletions...ql/execution/aggregate/SortBasedAggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/UnsafeHybridAggregationIterator.scala 398 additions, 0 deletions...execution/aggregate/UnsafeHybridAggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/aggregateOperators.scala 0 additions, 175 deletions...he/spark/sql/execution/aggregate/aggregateOperators.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/sortBasedIterators.scala 0 additions, 664 deletions...he/spark/sql/execution/aggregate/sortBasedIterators.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala 233 additions, 36 deletions...scala/org/apache/spark/sql/execution/aggregate/udaf.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala 58 additions, 41 deletions...cala/org/apache/spark/sql/execution/aggregate/utils.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala 0 additions, 1 deletion...scala/org/apache/spark/sql/execution/basicOperators.scala
- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 5 additions, 5 deletions...e/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlSerializer2Suite.scala 8 additions, 1 deletion...apache/spark/sql/execution/SparkSqlSerializer2Suite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala 75 additions, 43 deletions...ache/spark/sql/hive/execution/AggregationQuerySuite.scala
Loading
Please register or sign in to comment