-
- Downloads
[SPARK-20776] Fix perf. problems in JobProgressListener caused by TaskMetrics construction
## What changes were proposed in this pull request? In ``` ./bin/spark-shell --master=local[64] ``` I ran ``` sc.parallelize(1 to 100000, 100000).count() ``` and profiled the time spend in the LiveListenerBus event processing thread. I discovered that the majority of the time was being spent in `TaskMetrics.empty` calls in `JobProgressListener.onTaskStart`. It turns out that we can slightly refactor to remove the need to construct one empty instance per call, greatly improving the performance of this code. The performance gains here help to avoid an issue where listener events would be dropped because the JobProgressListener couldn't keep up with the throughput. **Before:**  **After:**  ## How was this patch tested? Benchmarks described above. Author: Josh Rosen <joshrosen@databricks.com> Closes #18008 from JoshRosen/nametoaccums-improvements.
Showing
- core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala 2 additions, 3 deletions.../scala/org/apache/spark/ui/jobs/JobProgressListener.scala
- core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala 28 additions, 26 deletionscore/src/main/scala/org/apache/spark/ui/jobs/UIData.scala
- core/src/test/scala/org/apache/spark/status/api/v1/AllStagesResourceSuite.scala 1 addition, 1 deletion...g/apache/spark/status/api/v1/AllStagesResourceSuite.scala
Please register or sign in to comment