Skip to content
Snippets Groups Projects
  • Bryan Cutler's avatar
    44281ca8
    [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe · 44281ca8
    Bryan Cutler authored
    ## What changes were proposed in this pull request?
    The `keyword_only` decorator in PySpark is not thread-safe.  It writes kwargs to a static class variable in the decorator, which is then retrieved later in the class method as `_input_kwargs`.  If multiple threads are constructing the same class with different kwargs, it becomes a race condition to read from the static class variable before it's overwritten.  See [SPARK-19348](https://issues.apache.org/jira/browse/SPARK-19348) for reproduction code.
    
    This change will write the kwargs to a member variable so that multiple threads can operate on separate instances without the race condition.  It does not protect against multiple threads operating on a single instance, but that is better left to the user to synchronize.
    
    ## How was this patch tested?
    Added new unit tests for using the keyword_only decorator and a regression test that verifies `_input_kwargs` can be overwritten from different class instances.
    
    Author: Bryan Cutler <cutlerb@gmail.com>
    
    Closes #16782 from BryanCutler/pyspark-keyword_only-threadsafe-SPARK-19348.
    44281ca8
    History
    [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe
    Bryan Cutler authored
    ## What changes were proposed in this pull request?
    The `keyword_only` decorator in PySpark is not thread-safe.  It writes kwargs to a static class variable in the decorator, which is then retrieved later in the class method as `_input_kwargs`.  If multiple threads are constructing the same class with different kwargs, it becomes a race condition to read from the static class variable before it's overwritten.  See [SPARK-19348](https://issues.apache.org/jira/browse/SPARK-19348) for reproduction code.
    
    This change will write the kwargs to a member variable so that multiple threads can operate on separate instances without the race condition.  It does not protect against multiple threads operating on a single instance, but that is better left to the user to synchronize.
    
    ## How was this patch tested?
    Added new unit tests for using the keyword_only decorator and a regression test that verifies `_input_kwargs` can be overwritten from different class instances.
    
    Author: Bryan Cutler <cutlerb@gmail.com>
    
    Closes #16782 from BryanCutler/pyspark-keyword_only-threadsafe-SPARK-19348.