Skip to content
Snippets Groups Projects
  1. Jan 18, 2014
  2. Jan 17, 2014
  3. Jan 16, 2014
    • Prashant Sharma's avatar
      fcb4fc65
    • Qiuzhuang Lian's avatar
    • Reynold Xin's avatar
      Merge pull request #445 from kayousterhout/exec_lost · c06a307c
      Reynold Xin authored
      Fail rather than hanging if a task crashes the JVM.
      
      Prior to this commit, if a task crashes the JVM, the task (and
      all other tasks running on that executor) is marked at KILLED rather
      than FAILED.  As a result, the TaskSetManager will retry the task
      indefinitely rather than failing the job after maxFailures. Eventually,
      this makes the job hang, because the Standalone Scheduler removes
      the application after 10 works have failed, and then the app is left
      in a state where it's disconnected from the master and waiting to reconnect.
      This commit fixes that problem by marking tasks as FAILED rather than
      killed when an executor is lost.
      
      The downside of this commit is that if task A fails because another
      task running on the same executor caused the VM to crash, the failure
      will incorrectly be counted as a failure of task A. This should not
      be an issue because we typically set maxFailures to 3, and it is
      unlikely that a task will be co-located with a JVM-crashing task
      multiple times.
      c06a307c
    • Kay Ousterhout's avatar
      Updated unit test comment · 718a13c1
      Kay Ousterhout authored
      718a13c1
  4. Jan 15, 2014
  5. Jan 14, 2014
Loading