-
- Downloads
[SPARK-20079][YARN] Fix client AM not allocating executors after restart.
The main goal of this change is to avoid the situation described in the bug, where an AM restart in the middle of a job may cause no new executors to be allocated because of faulty logic in the reset path. The change does two things: - fixes the executor alloc manager's reset() so that it does not stop allocation after a reset() in the middle of a job - re-orders the initialization of the YarnAllocator class so that it fetches the current executor ID before triggering the reset() above. This ensures both that the new allocator gets new requests for executors, and that it starts from the correct executor id. Tested with unit tests and by manually causing AM restarts while running jobs using spark-shell in YARN mode. Closes #17882 Author: Marcelo Vanzin <vanzin@cloudera.com> Author: Guoqiang Li <witgo@qq.com> Closes #18663 from vanzin/SPARK-20079.
Showing
- core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala 18 additions, 8 deletions...in/scala/org/apache/spark/ExecutorAllocationManager.scala
- resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 22 additions, 36 deletions...cala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
- resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala 1 addition, 10 deletions...apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
Loading
Please register or sign in to comment