-
- Downloads
[SPARK-12837][CORE] Do not send the name of internal accumulator to executor side
## What changes were proposed in this pull request? When sending accumulator updates back to driver, the network overhead is pretty big as there are a lot of accumulators, e.g. `TaskMetrics` will send about 20 accumulators everytime, there may be a lot of `SQLMetric` if the query plan is complicated. Therefore, it's critical to reduce the size of serialized accumulator. A simple way is to not send the name of internal accumulators to executor side, as it's unnecessary. When executor sends accumulator updates back to driver, we can look up the accumulator name in `AccumulatorContext` easily. Note that, we still need to send names of normal accumulators, as the user code run at executor side may rely on accumulator names. In the future, we should reimplement `TaskMetrics` to not rely on accumulators and use custom serialization. Tried on the example in https://issues.apache.org/jira/browse/SPARK-12837, the size of serialized accumulator has been cut down by about 40%. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #17596 from cloud-fan/oom.
Showing
- core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 13 additions, 16 deletions...rc/main/scala/org/apache/spark/executor/TaskMetrics.scala
- core/src/main/scala/org/apache/spark/scheduler/Task.scala 5 additions, 8 deletionscore/src/main/scala/org/apache/spark/scheduler/Task.scala
- core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala 15 additions, 13 deletions.../src/main/scala/org/apache/spark/util/AccumulatorV2.scala
- core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala 1 addition, 1 deletion...t/scala/org/apache/spark/scheduler/TaskContextSuite.scala
- core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala 1 addition, 1 deletion...a/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala
- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 1 addition, 1 deletion.../test/scala/org/apache/spark/util/JsonProtocolSuite.scala
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java 6 additions, 6 deletions.../datasources/parquet/SpecificParquetRecordReaderBase.java
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala 34 additions, 8 deletions...ql/execution/datasources/parquet/ParquetFilterSuite.scala
Loading
Please register or sign in to comment