Skip to content
Snippets Groups Projects
Commit 698ef762 authored by Takuya UESHIN's avatar Takuya UESHIN Committed by Kay Ousterhout
Browse files

[SPARK-14269][SCHEDULER] Eliminate unnecessary submitStage() call.

## What changes were proposed in this pull request?

Currently a method `submitStage()` for waiting stages is called on every iteration of the event loop in `DAGScheduler` to submit all waiting stages, but most of them are not necessary because they are not related to Stage status.
The case we should try to submit waiting stages is only when their parent stages are successfully completed.

This elimination can improve `DAGScheduler` performance.

## How was this patch tested?

Added some checks and other existing tests, and our projects.

We have a project bottle-necked by `DAGScheduler`, having about 2000 stages.

Before this patch the almost all execution time in `Driver` process was spent to process `submitStage()` of `dag-scheduler-event-loop` thread but after this patch the performance was improved as follows:

|        | total execution time | `dag-scheduler-event-loop` thread time | `submitStage()` |
|--------|---------------------:|---------------------------------------:|----------------:|
| Before |              760 sec |                                710 sec |         667 sec |
| After  |              440 sec |                                 14 sec |          10 sec |

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #12060 from ueshin/issues/SPARK-14269.
parent c875d81a
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment