-
- Downloads
[SPARK-11131][CORE] Fix race in worker registration protocol.
Because the registration RPC was not really an RPC, but a bunch of disconnected messages, it was possible for other messages to be sent before the reply to the registration arrived, and that would confuse the Worker. Especially in local-cluster mode, the worker was succeptible to receiving an executor request before it received a message from the master saying registration succeeded. On top of the above, the change also fixes a ClassCastException when the registration fails, which also affects the executor registration protocol. Because the `ask` is issued with a specific return type, if the error message (of a different type) was returned instead, the code would just die with an exception. This is fixed by having a common base trait for these reply messages. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9138 from vanzin/SPARK-11131.
Showing
- core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala 6 additions, 1 deletion...rc/main/scala/org/apache/spark/deploy/DeployMessage.scala
- core/src/main/scala/org/apache/spark/deploy/master/Master.scala 25 additions, 25 deletions...rc/main/scala/org/apache/spark/deploy/master/Master.scala
- core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 47 additions, 26 deletions...rc/main/scala/org/apache/spark/deploy/worker/Worker.scala
- core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala 2 additions, 2 deletions.../apache/spark/executor/CoarseGrainedExecutorBackend.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala 4 additions, 0 deletions...spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
- core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala 2 additions, 2 deletions.../test/scala/org/apache/spark/HeartbeatReceiverSuite.scala
Loading
Please register or sign in to comment