-
- Downloads
[SPARK-12978][SQL] Skip unnecessary final group-by when input data already...
[SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys This ticket targets the optimization to skip an unnecessary group-by operation below; Without opt.: ``` == Physical Plan == TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178]) +- TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)], output=[col0#159,sum#200,sum#201,count#202L]) +- TungstenExchange hashpartitioning(col0#159,200), None +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None ``` With opt.: ``` == Physical Plan == TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Complete,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178]) +- TungstenExchange hashpartitioning(col0#159,200), None +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None ``` Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10896 from maropu/SkipGroupbySpike.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 5 additions, 12 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala 115 additions, 135 deletions...a/org/apache/spark/sql/execution/aggregate/AggUtils.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregateExec.scala 56 additions, 0 deletions.../apache/spark/sql/execution/aggregate/AggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala 1 addition, 21 deletions...che/spark/sql/execution/aggregate/HashAggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala 2 additions, 22 deletions...che/spark/sql/execution/aggregate/SortAggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala 26 additions, 12 deletions...che/spark/sql/execution/exchange/EnsureRequirements.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 8 additions, 7 deletions.../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala 44 additions, 15 deletions...t/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Loading
Please register or sign in to comment