-
- Downloads
[SPARK-17511] Yarn Dynamic Allocation: Avoid marking released container as Failed
## What changes were proposed in this pull request? Due to race conditions, the ` assert(numExecutorsRunning <= targetNumExecutors)` can fail causing `AssertionError`. So removed the assertion, instead moved the conditional check before launching new container: ``` java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1.org$apache$spark$deploy$yarn$YarnAllocator$$anonfun$$updateInternalState$1(YarnAllocator.scala:489) at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:519) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` ## How was this patch tested? This was manually tested using a large ForkAndJoin job with Dynamic Allocation enabled to validate the failing job succeeds, without any such exception. Author: Kishor Patil <kpatil@yahoo-inc.com> Closes #15069 from kishorvpatil/SPARK-17511.
Showing
- yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala 33 additions, 29 deletions...in/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
- yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala 19 additions, 0 deletions...ala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
Please register or sign in to comment