Skip to content
Snippets Groups Projects
  • Aaron Davidson's avatar
    18caa8cb
    SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions · 18caa8cb
    Aaron Davidson authored
    This patch includes several cleanups to PythonRDD, focused around fixing [SPARK-1579](https://issues.apache.org/jira/browse/SPARK-1579) cleanly. Listed in order of approximate importance:
    
    - The Python daemon waits for Spark to close the socket before exiting,
      in order to avoid causing spurious IOExceptions in Spark's
      `PythonRDD::WriterThread`.
    - Removes the Python Monitor Thread, which polled for task cancellations
      in order to kill the Python worker. Instead, we do this in the
      onCompleteCallback, since this is guaranteed to be called during
      cancellation.
    - Adds a "completed" variable to TaskContext to avoid the issue noted in
      [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), where onCompleteCallbacks may be execution-order dependent.
      Along with this, I removed the "context.interrupted = true" flag in
      the onCompleteCallback.
    - Extracts PythonRDD::WriterThread to its own class.
    
    Since this patch provides an alternative solution to [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), I did test it with
    
    ```
    sc.textFile("latlon.tsv").take(5)
    ```
    
    many times without error.
    
    Additionally, in order to test the unswallowed exceptions, I performed
    
    ```
    sc.textFile("s3n://<big
    
     file>").count()
    ```
    
    and cut my internet during execution. Prior to this patch, we got the "stdin writer exited early" message, which was unhelpful. Now, we get the SocketExceptions propagated through Spark to the user and get proper (though unsuccessful) task retries.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #640 from aarondav/pyspark-io and squashes the following commits:
    
    b391ff8 [Aaron Davidson] Detect "clean socket shutdowns" and stop waiting on the socket
    c0c49da [Aaron Davidson] SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions
    (cherry picked from commit 3308722c)
    
    Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
    18caa8cb
    History
    SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions
    Aaron Davidson authored
    This patch includes several cleanups to PythonRDD, focused around fixing [SPARK-1579](https://issues.apache.org/jira/browse/SPARK-1579) cleanly. Listed in order of approximate importance:
    
    - The Python daemon waits for Spark to close the socket before exiting,
      in order to avoid causing spurious IOExceptions in Spark's
      `PythonRDD::WriterThread`.
    - Removes the Python Monitor Thread, which polled for task cancellations
      in order to kill the Python worker. Instead, we do this in the
      onCompleteCallback, since this is guaranteed to be called during
      cancellation.
    - Adds a "completed" variable to TaskContext to avoid the issue noted in
      [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), where onCompleteCallbacks may be execution-order dependent.
      Along with this, I removed the "context.interrupted = true" flag in
      the onCompleteCallback.
    - Extracts PythonRDD::WriterThread to its own class.
    
    Since this patch provides an alternative solution to [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), I did test it with
    
    ```
    sc.textFile("latlon.tsv").take(5)
    ```
    
    many times without error.
    
    Additionally, in order to test the unswallowed exceptions, I performed
    
    ```
    sc.textFile("s3n://<big
    
     file>").count()
    ```
    
    and cut my internet during execution. Prior to this patch, we got the "stdin writer exited early" message, which was unhelpful. Now, we get the SocketExceptions propagated through Spark to the user and get proper (though unsuccessful) task retries.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #640 from aarondav/pyspark-io and squashes the following commits:
    
    b391ff8 [Aaron Davidson] Detect "clean socket shutdowns" and stop waiting on the socket
    c0c49da [Aaron Davidson] SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions
    (cherry picked from commit 3308722c)
    
    Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
context.py 21.60 KiB