-
- Downloads
[SPARK-8167] Make tasks that fail from YARN preemption not fail job
The architecture is that, in YARN mode, if the driver detects that an executor has disconnected, it asks the ApplicationMaster why the executor died. If the ApplicationMaster is aware that the executor died because of preemption, all tasks associated with that executor are not marked as failed. The executor is still removed from the driver's list of available executors, however. There's a few open questions: 1. Should standalone mode have a similar "get executor loss reason" as well? I localized this change as much as possible to affect only YARN, but there could be a valid case to differentiate executor losses in standalone mode as well. 2. I make a pretty strong assumption in YarnAllocator that getExecutorLossReason(executorId) will only be called once per executor id; I do this so that I can remove the metadata from the in-memory map to avoid object accumulation. It's not clear if I'm being overly zealous to save space, however. cc vanzin specifically for review because it collided with some earlier YARN scheduling work. cc JoshRosen because it's similar to output commit coordination we did in the past cc andrewor14 for our discussion on how to get executor exit codes and loss reasons Author: mcheah <mcheah@palantir.com> Closes #8007 from mccheah/feature/preemption-handling.
Showing
- core/src/main/scala/org/apache/spark/TaskEndReason.scala 16 additions, 2 deletionscore/src/main/scala/org/apache/spark/TaskEndReason.scala
- core/src/main/scala/org/apache/spark/scheduler/ExecutorLossReason.scala 9 additions, 5 deletions...scala/org/apache/spark/scheduler/ExecutorLossReason.scala
- core/src/main/scala/org/apache/spark/scheduler/Pool.scala 2 additions, 2 deletionscore/src/main/scala/org/apache/spark/scheduler/Pool.scala
- core/src/main/scala/org/apache/spark/scheduler/Schedulable.scala 1 addition, 1 deletion...c/main/scala/org/apache/spark/scheduler/Schedulable.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 5 additions, 4 deletions.../scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala 14 additions, 7 deletions...ain/scala/org/apache/spark/scheduler/TaskSetManager.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala 7 additions, 1 deletion...spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala 15 additions, 9 deletions...ark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala 3 additions, 3 deletions...spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala 71 additions, 6 deletions...apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala 2 additions, 2 deletions...scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala 1 addition, 1 deletion...spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 6 additions, 3 deletionscore/src/main/scala/org/apache/spark/util/JsonProtocol.scala
- core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala 28 additions, 5 deletions...cala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 6 additions, 4 deletions.../test/scala/org/apache/spark/util/JsonProtocolSuite.scala
- yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 7 additions, 0 deletions...cala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
- yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala 68 additions, 24 deletions...in/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
Loading
Please register or sign in to comment