Skip to content
  • Imran Rashid's avatar
    8417a7ae
    [SPARK-19276][CORE] Fetch Failure handling robust to user error handling · 8417a7ae
    Imran Rashid authored
    ## What changes were proposed in this pull request?
    
    Fault-tolerance in spark requires special handling of shuffle fetch
    failures.  The Executor would catch FetchFailedException and send a
    special msg back to the driver.
    
    However, intervening user code could intercept that exception, and wrap
    it with something else.  This even happens in SparkSQL.  So rather than
    checking the thrown exception only, we'll store the fetch failure directly
    in the TaskContext, where users can't touch it.
    
    ## How was this patch tested?
    
    Added a test case which failed before the fix.  Full test suite via jenkins.
    
    Author: Imran Rashid <irashid@cloudera.com>
    
    Closes #16639 from squito/SPARK-19276.
    8417a7ae
    [SPARK-19276][CORE] Fetch Failure handling robust to user error handling
    Imran Rashid authored
    ## What changes were proposed in this pull request?
    
    Fault-tolerance in spark requires special handling of shuffle fetch
    failures.  The Executor would catch FetchFailedException and send a
    special msg back to the driver.
    
    However, intervening user code could intercept that exception, and wrap
    it with something else.  This even happens in SparkSQL.  So rather than
    checking the thrown exception only, we'll store the fetch failure directly
    in the TaskContext, where users can't touch it.
    
    ## How was this patch tested?
    
    Added a test case which failed before the fix.  Full test suite via jenkins.
    
    Author: Imran Rashid <irashid@cloudera.com>
    
    Closes #16639 from squito/SPARK-19276.
Loading