-
- Downloads
[SPARK-14269][SCHEDULER] Eliminate unnecessary submitStage() call.
## What changes were proposed in this pull request? Currently a method `submitStage()` for waiting stages is called on every iteration of the event loop in `DAGScheduler` to submit all waiting stages, but most of them are not necessary because they are not related to Stage status. The case we should try to submit waiting stages is only when their parent stages are successfully completed. This elimination can improve `DAGScheduler` performance. ## How was this patch tested? Added some checks and other existing tests, and our projects. We have a project bottle-necked by `DAGScheduler`, having about 2000 stages. Before this patch the almost all execution time in `Driver` process was spent to process `submitStage()` of `dag-scheduler-event-loop` thread but after this patch the performance was improved as follows: | | total execution time | `dag-scheduler-event-loop` thread time | `submitStage()` | |--------|---------------------:|---------------------------------------:|----------------:| | Before | 760 sec | 710 sec | 667 sec | | After | 440 sec | 14 sec | 10 sec | Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #12060 from ueshin/issues/SPARK-14269.
Showing
- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 10 additions, 25 deletions.../main/scala/org/apache/spark/scheduler/DAGScheduler.scala
- core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 7 additions, 0 deletions.../scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Loading
Please register or sign in to comment