-
- Downloads
[SPARK-16379][CORE][MESOS] Spark on mesos is broken due to race condition in Logging
## What changes were proposed in this pull request? The commit https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec introduced a lazy val to simplify code in Logging. Simple enough, though one side effect is that accessing log now means grabbing the instance's lock. This in turn turned up a form of deadlock in the Mesos code. It was arguably a bit of a problem in how this code is structured, but, in any event the safest thing to do seems to be to revert the commit, and that's 90% of the change here; it's just not worth the risk of similar more subtle issues. What I didn't revert here was the removal of this odd override of log in the Mesos code. In retrospect it might have been put in place at some stage as a defense against this type of problem. After all the Logging code still involved a lock at initialization before the change in question. Even after the revert, it doesn't seem like it does anything, given how Logging works now, so I left it removed. However, I also removed the particular log message that ended up playing a part in this problem anyway, maybe being paranoid, to make sure this type of problem can't happen even with how the current locking works in logging initialization. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #14069 from srowen/SPARK-16379.
Showing
- core/src/main/scala/org/apache/spark/internal/Logging.scala 10 additions, 4 deletionscore/src/main/scala/org/apache/spark/internal/Logging.scala
- core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala 0 additions, 1 deletion...er/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
Please register or sign in to comment