-
- Downloads
[SPARK-21456][MESOS] Make the driver failover_timeout configurable
## What changes were proposed in this pull request? Current behavior: in Mesos cluster mode, the driver failover_timeout is set to zero. If the driver temporarily loses connectivity with the Mesos master, the framework will be torn down and all executors killed. Proposed change: make the failover_timeout configurable via a new option, spark.mesos.driver.failoverTimeout. The default value is still zero. Note: with non-zero failover_timeout, an explicit teardown is needed in some cases. This is captured in https://issues.apache.org/jira/browse/SPARK-21458 ## How was this patch tested? Added a unit test to make sure the config option is set while creating the scheduler driver. Ran an integration test with mesosphere/spark showing that with a non-zero failover_timeout the Spark job finishes after a driver is disconnected from the master. Author: Susan X. Huynh <xhuynh@mesosphere.com> Closes #18674 from susanxhuynh/sh-mesos-failover-timeout.
Showing
- docs/running-on-mesos.md 11 additions, 0 deletionsdocs/running-on-mesos.md
- resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala 8 additions, 1 deletion...src/main/scala/org/apache/spark/deploy/mesos/config.scala
- resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala 2 additions, 1 deletion...er/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
- resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala 36 additions, 0 deletions...uster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala
Loading
Please register or sign in to comment