-
- Downloads
Standalone Scheduler fault recovery
Implements a basic form of Standalone Scheduler fault recovery. In particular, this allows faults to be manually recovered from by means of restarting the Master process on the same machine. This is the majority of the code necessary for general fault tolerance, which will first elect a leader and then recover the Master state. In order to enable fault recovery, the Master will persist a small amount of state related to the registration of Workers and Applications to disk. If the Master is started and sees that this state is still around, it will enter Recovery mode, during which time it will not schedule any new Executors on Workers (but it does accept the registration of new Clients and Workers). At this point, the Master attempts to reconnect to all Workers and Client applications that were registered at the time of failure. After confirming either the existence or nonexistence of all such nodes (within a certain timeout), the Master will exit Recovery mode and resume normal scheduling.
Showing
- bin/stop-slaves.sh 1 addition, 1 deletionbin/stop-slaves.sh
- core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala 12 additions, 0 deletions...rc/main/scala/org/apache/spark/deploy/DeployMessage.scala
- core/src/main/scala/org/apache/spark/deploy/ExecutorDescription.scala 29 additions, 0 deletions...n/scala/org/apache/spark/deploy/ExecutorDescription.scala
- core/src/main/scala/org/apache/spark/deploy/client/Client.scala 11 additions, 6 deletions...rc/main/scala/org/apache/spark/deploy/client/Client.scala
- core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala 44 additions, 21 deletions...cala/org/apache/spark/deploy/master/ApplicationInfo.scala
- core/src/main/scala/org/apache/spark/deploy/master/ApplicationState.scala 2 additions, 2 deletions...ala/org/apache/spark/deploy/master/ApplicationState.scala
- core/src/main/scala/org/apache/spark/deploy/master/ExecutorInfo.scala 6 additions, 1 deletion...n/scala/org/apache/spark/deploy/master/ExecutorInfo.scala
- core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala 90 additions, 0 deletions...che/spark/deploy/master/FileSystemPersistenceEngine.scala
- core/src/main/scala/org/apache/spark/deploy/master/Master.scala 135 additions, 19 deletions...rc/main/scala/org/apache/spark/deploy/master/Master.scala
- core/src/main/scala/org/apache/spark/deploy/master/MasterState.scala 26 additions, 0 deletions...in/scala/org/apache/spark/deploy/master/MasterState.scala
- core/src/main/scala/org/apache/spark/deploy/master/PersistenceEngine.scala 50 additions, 0 deletions...la/org/apache/spark/deploy/master/PersistenceEngine.scala
- core/src/main/scala/org/apache/spark/deploy/master/WorkerInfo.scala 22 additions, 6 deletions...ain/scala/org/apache/spark/deploy/master/WorkerInfo.scala
- core/src/main/scala/org/apache/spark/deploy/master/WorkerState.scala 4 additions, 2 deletions...in/scala/org/apache/spark/deploy/master/WorkerState.scala
- core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala 8 additions, 5 deletions...scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
- core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 18 additions, 11 deletions...rc/main/scala/org/apache/spark/deploy/worker/Worker.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala 1 addition, 2 deletions...spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
Please register or sign in to comment