Skip to content
Snippets Groups Projects
Commit fac6085c authored by Kay Ousterhout's avatar Kay Ousterhout Committed by Patrick Wendell
Browse files

[SPARK-1397] Notify SparkListeners when stages fail or are cancelled.

[I wanted to post this for folks to comment but it depends on (and thus includes the changes in) a currently outstanding PR, #305.  You can look at just the second commit: https://github.com/kayousterhout/spark-1/commit/93f08baf731b9eaf5c9792a5373560526e2bccac to see just the changes relevant to this PR]

Previously, when stages fail or get cancelled, the SparkListener is only notified
indirectly through the SparkListenerJobEnd, where we sometimes pass in a single
stage that failed.  This worked before job cancellation, because jobs would only fail
due to a single stage failure.  However, with job cancellation, multiple running stages
can fail when a job gets cancelled.  Right now, this is not handled correctly, which
results in stages that get stuck in the “Running Stages” window in the UI even
though they’re dead.

This PR changes the SparkListenerStageCompleted event to a SparkListenerStageEnded
event, and uses this event to tell SparkListeners when stages fail in addition to when
they complete successfully.  This change is NOT publicly backward compatible for two
reasons.  First, it changes the SparkListener interface.  We could alternately add a new event,
SparkListenerStageFailed, and keep the existing SparkListenerStageCompleted.  However,
this is less consistent with the listener events for tasks / jobs ending, and will result in some
code duplication for listeners (because failed and completed stages are handled in similar
ways).  Note that I haven’t finished updating the JSON code to correctly handle the new event
because I’m waiting for feedback on whether this is a good or bad idea (hence the “WIP”).

It is also not backwards compatible because it changes the publicly visible JobWaiter.jobFailed()
method to no longer include a stage that caused the failure.  I think this change should definitely
stay, because with cancellation (as described above), a failure isn’t necessarily caused by a
single stage.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #309 from kayousterhout/stage_cancellation and squashes the following commits:

5533ecd [Kay Ousterhout] Fixes in response to Mark's review
320c7c7 [Kay Ousterhout] Notify SparkListeners when stages fail or are cancelled.
parent e25b5934
No related branches found
No related tags found
No related merge requests found
Showing
with 151 additions and 78 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment