Skip to content
  • Susan X. Huynh's avatar
    c42ef953
    [SPARK-21456][MESOS] Make the driver failover_timeout configurable · c42ef953
    Susan X. Huynh authored
    ## What changes were proposed in this pull request?
    
    Current behavior: in Mesos cluster mode, the driver failover_timeout is set to zero. If the driver temporarily loses connectivity with the Mesos master, the framework will be torn down and all executors killed.
    
    Proposed change: make the failover_timeout configurable via a new option, spark.mesos.driver.failoverTimeout. The default value is still zero.
    
    Note: with non-zero failover_timeout, an explicit teardown is needed in some cases. This is captured in https://issues.apache.org/jira/browse/SPARK-21458
    
    ## How was this patch tested?
    
    Added a unit test to make sure the config option is set while creating the scheduler driver.
    
    Ran an integration test with mesosphere/spark showing that with a non-zero failover_timeout the Spark job finishes after a driver is disconnected from the master.
    
    Author: Susan X. Huynh <xhuynh@mesosphere.com>
    
    Closes #18674 from susanxhuynh/sh-mesos-failover-timeout.
    c42ef953
    [SPARK-21456][MESOS] Make the driver failover_timeout configurable
    Susan X. Huynh authored
    ## What changes were proposed in this pull request?
    
    Current behavior: in Mesos cluster mode, the driver failover_timeout is set to zero. If the driver temporarily loses connectivity with the Mesos master, the framework will be torn down and all executors killed.
    
    Proposed change: make the failover_timeout configurable via a new option, spark.mesos.driver.failoverTimeout. The default value is still zero.
    
    Note: with non-zero failover_timeout, an explicit teardown is needed in some cases. This is captured in https://issues.apache.org/jira/browse/SPARK-21458
    
    ## How was this patch tested?
    
    Added a unit test to make sure the config option is set while creating the scheduler driver.
    
    Ran an integration test with mesosphere/spark showing that with a non-zero failover_timeout the Spark job finishes after a driver is disconnected from the master.
    
    Author: Susan X. Huynh <xhuynh@mesosphere.com>
    
    Closes #18674 from susanxhuynh/sh-mesos-failover-timeout.
Loading