-
- Downloads
[SQL] SPARK-1371 Hash Aggregation Improvements
Given: ```scala case class Data(a: Int, b: Int) val rdd = sparkContext .parallelize(1 to 200) .flatMap(_ => (1 to 50000).map(i => Data(i % 100, i))) rdd.registerAsTable("data") cacheTable("data") ``` Before: ``` SELECT COUNT(*) FROM data:[10000000] 16795.567ms SELECT a, SUM(b) FROM data GROUP BY a 7536.436ms SELECT SUM(b) FROM data 10954.1ms ``` After: ``` SELECT COUNT(*) FROM data:[10000000] 1372.175ms SELECT a, SUM(b) FROM data GROUP BY a 2070.446ms SELECT SUM(b) FROM data 958.969ms ``` Author: Michael Armbrust <michael@databricks.com> Closes #295 from marmbrus/hashAgg and squashes the following commits: ec63575 [Michael Armbrust] Add comment. d0495a9 [Michael Armbrust] Use scaladoc instead. b4a6887 [Michael Armbrust] Address review comments. a2d90ba [Michael Armbrust] Capture child output statically to avoid issues with generators and serialization. 7c13112 [Michael Armbrust] Rewrite Aggregate operator to stream input and use projections. Remove unused local RDD functions implicits. 5096f99 [Michael Armbrust] Make HiveUDAF fields transient since object inspectors are not serializable. 6a4b671 [Michael Armbrust] Add option to avoid binding operators expressions automatically. 92cca08 [Michael Armbrust] Always include serialization debug info when running tests. 1279df2 [Michael Armbrust] Increase default number of partitions.
Showing
- project/SparkBuild.scala 1 addition, 0 deletionsproject/SparkBuild.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala 6 additions, 0 deletions...pache/spark/sql/catalyst/expressions/BoundAttribute.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala 3 additions, 3 deletions...rg/apache/spark/sql/catalyst/expressions/Projection.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala 8 additions, 8 deletions...rg/apache/spark/sql/catalyst/expressions/aggregates.scala
- sql/core/src/main/scala/org/apache/spark/rdd/PartitionLocalRDDFunctions.scala 0 additions, 100 deletions...ala/org/apache/spark/rdd/PartitionLocalRDDFunctions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala 1 addition, 1 deletion.../main/scala/org/apache/spark/sql/execution/Exchange.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregates.scala 135 additions, 48 deletions...ain/scala/org/apache/spark/sql/execution/aggregates.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala 3 additions, 0 deletions...e/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala
Please register or sign in to comment