Skip to content
Snippets Groups Projects
Commit 61ca7742 authored by Andrew Or's avatar Andrew Or Committed by Kay Ousterhout
Browse files

[SPARK-4020] Do not rely on timeouts to remove failed block managers

If an executor fails without being scheduled to run any tasks, then `DAGScheduler` won't notify `BlockManagerMasterActor` that the associated block manager should be removed. Instead, the associated block manager will be expired only after a few rounds of heartbeat timeouts. In terms of removal treatment, there should really be no distinction between executors that have been scheduled tasks and those that have not.

The fix, then, is to add all known executors to `TaskSchedulerImpl`'s `activeExecutorIds` whether or not it has been scheduled a task. In fact, the existing comment above `activeExecutorIds` is
```
// Which executor IDs we have executors on
val activeExecutorIds = new HashSet[String]
```
not  "Which executors have been scheduled tasks thus far."

Author: Andrew Or <andrewor14@gmail.com>

Closes #2865 from andrewor14/active-executors and squashes the following commits:

ff3172b [Andrew Or] Add all known executors to `activeExecutorIds`
parent c262cd5d
No related branches found
No related tags found
No related merge requests found
......@@ -221,6 +221,7 @@ private[spark] class TaskSchedulerImpl(
var newExecAvail = false
for (o <- offers) {
executorIdToHost(o.executorId) = o.host
activeExecutorIds += o.executorId
if (!executorsByHost.contains(o.host)) {
executorsByHost(o.host) = new HashSet[String]()
executorAdded(o.executorId, o.host)
......@@ -261,7 +262,6 @@ private[spark] class TaskSchedulerImpl(
val tid = task.taskId
taskIdToTaskSetId(tid) = taskSet.taskSet.id
taskIdToExecutorId(tid) = execId
activeExecutorIds += execId
executorsByHost(host) += execId
availableCpus(i) -= CPUS_PER_TASK
assert(availableCpus(i) >= 0)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment