-
- Downloads
[SPARK-20832][CORE] Standalone master should explicitly inform drivers of...
[SPARK-20832][CORE] Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs ## What changes were proposed in this pull request? In standalone mode, master should explicitly inform each active driver of any worker deaths, so the invalid external shuffle service outputs on the lost host would be removed from the shuffle mapStatus, thus we can avoid future `FetchFailure`s. ## How was this patch tested? Manually tested by the following steps: 1. Start a standalone Spark cluster with one driver node and two worker nodes; 2. Run a Job with ShuffleMapStage, ensure the outputs distribute on each worker; 3. Run another Job to make all executors exit, but the workers are all alive; 4. Kill one of the workers; 5. Run rdd.collect(), before this change, we should see `FetchFailure`s and failed Stages, while after the change, the job should complete without failure. Before the change:  After the change:  Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18362 from jiangxb1987/removeWorker.
Showing
- core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala 2 additions, 0 deletions...rc/main/scala/org/apache/spark/deploy/DeployMessage.scala
- core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala 4 additions, 0 deletions.../org/apache/spark/deploy/client/StandaloneAppClient.scala
- core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClientListener.scala 5 additions, 3 deletions...che/spark/deploy/client/StandaloneAppClientListener.scala
- core/src/main/scala/org/apache/spark/deploy/master/Master.scala 10 additions, 5 deletions...rc/main/scala/org/apache/spark/deploy/master/Master.scala
- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 30 additions, 0 deletions.../main/scala/org/apache/spark/scheduler/DAGScheduler.scala
- core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerEvent.scala 3 additions, 0 deletions.../scala/org/apache/spark/scheduler/DAGSchedulerEvent.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala 5 additions, 0 deletions...main/scala/org/apache/spark/scheduler/TaskScheduler.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 5 additions, 0 deletions.../scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala 3 additions, 0 deletions...spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala 21 additions, 4 deletions...ark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala 5 additions, 0 deletions.../spark/scheduler/cluster/StandaloneSchedulerBackend.scala
- core/src/test/scala/org/apache/spark/deploy/client/AppClientSuite.scala 2 additions, 0 deletions...scala/org/apache/spark/deploy/client/AppClientSuite.scala
- core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 2 additions, 0 deletions.../scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
- core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala 1 addition, 0 deletions.../apache/spark/scheduler/ExternalClusterManagerSuite.scala
Loading
Please register or sign in to comment