-
- Downloads
[SPARK-11718][YARN][CORE] Fix explicitly killed executor dies silently issue
Currently if dynamic allocation is enabled, explicitly killing executor will not get response, so the executor metadata is wrong in driver side. Which will make dynamic allocation on Yarn fail to work. The problem is `disableExecutor` returns false for pending killing executors when `onDisconnect` is detected, so no further implementation is done. One solution is to bypass these explicitly killed executors to use `super.onDisconnect` to remove executor. This is simple. Another solution is still querying the loss reason for these explicitly kill executors. Since executor may get killed and informed in the same AM-RM communication, so current way of adding pending loss reason request is not worked (container complete is already processed), here we should store this loss reason for later query. Here this PR chooses solution 2. Please help to review. vanzin I think this part is changed by you previously, would you please help to review? Thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9684 from jerryshao/SPARK-11718.
Showing
- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 1 addition, 0 deletions.../scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala 4 additions, 2 deletions...ark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
- yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala 24 additions, 6 deletions...in/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
Loading
Please register or sign in to comment