Skip to content
Snippets Groups Projects
Commit 9bcb33c5 authored by Shixiong Zhu's avatar Shixiong Zhu Committed by Marcelo Vanzin
Browse files

[SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking

## What changes were proposed in this pull request?

StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint.

This PR just changed CoarseGrainedSchedulerBackend.removeExecutor to be non-blocking. It's safe since the only two usages (StandaloneSchedulerBackend and YarnSchedulerEndpoint) don't need the return value).

## How was this patch tested?

Jenkins unit tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #14882 from zsxwing/SPARK-17316.
parent 0611b3a2
No related branches found
No related tags found
No related merge requests found
......@@ -406,14 +406,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
}
// Called by subclasses when notified of a lost worker
def removeExecutor(executorId: String, reason: ExecutorLossReason) {
try {
driverEndpoint.askWithRetry[Boolean](RemoveExecutor(executorId, reason))
} catch {
case e: Exception =>
throw new SparkException("Error notifying standalone scheduler's driver endpoint", e)
}
/**
* Called by subclasses when notified of a lost worker. It just fires the message and returns
* at once.
*/
protected def removeExecutor(executorId: String, reason: ExecutorLossReason): Unit = {
// Only log the failure since we don't care about the result.
driverEndpoint.ask(RemoveExecutor(executorId, reason)).onFailure { case t =>
logError(t.getMessage, t)
}(ThreadUtils.sameThread)
}
def sufficientResourcesRegistered(): Boolean = true
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment