-
- Downloads
[SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning
According to yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide these closures in Spark we know for sure they are serializable, so we can bypass the cleaning. Author: Andrew Or <andrew@databricks.com> Closes #6256 from andrewor14/sql-partition-speed-up and squashes the following commits: a82b451 [Andrew Or] Fix style 10f7e3e [Andrew Or] Avoid getting call sites and cleaning closures 17e2943 [Andrew Or] Merge branch 'master' of github.com:apache/spark into sql-partition-speed-up 523f042 [Andrew Or] Skip unnecessary Utils.getCallSites too f7fe143 [Andrew Or] Avoid unnecessary closure cleaning
Showing
- core/src/main/scala/org/apache/spark/util/Utils.scala 18 additions, 0 deletionscore/src/main/scala/org/apache/spark/util/Utils.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala 50 additions, 48 deletions.../main/scala/org/apache/spark/sql/parquet/newParquet.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/DataSourceStrategy.scala 15 additions, 3 deletions...ala/org/apache/spark/sql/sources/DataSourceStrategy.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/SqlNewHadoopRDD.scala 0 additions, 4 deletions.../scala/org/apache/spark/sql/sources/SqlNewHadoopRDD.scala
Loading
Please register or sign in to comment