Skip to content
Snippets Groups Projects
  • Kyle Kelley's avatar
    751f5133
    [SPARK-21070][PYSPARK] Attempt to update cloudpickle again · 751f5133
    Kyle Kelley authored
    ## What changes were proposed in this pull request?
    
    Based on https://github.com/apache/spark/pull/18282 by rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle.
    
    Some notable changes:
    * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
    * Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90)
    * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
    * Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85)
    * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
    * Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
    * Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52)
    * Support method descriptor (cloudpipe/cloudpickle#46)
    * No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
    * ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes.
    * Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96)
    * BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102)
    
    ## How was this patch tested?
    
    Existing PySpark unit tests + the unit tests from the cloudpickle project on their own.
    
    Author: Holden Karau <holden@us.ibm.com>
    Author: Kyle Kelley <rgbkrk@gmail.com>
    
    Closes #18734 from holdenk/holden-rgbkrk-cloudpickle-upgrades.
    751f5133
    History
    [SPARK-21070][PYSPARK] Attempt to update cloudpickle again
    Kyle Kelley authored
    ## What changes were proposed in this pull request?
    
    Based on https://github.com/apache/spark/pull/18282 by rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle.
    
    Some notable changes:
    * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
    * Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90)
    * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
    * Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85)
    * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
    * Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
    * Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52)
    * Support method descriptor (cloudpipe/cloudpickle#46)
    * No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
    * ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes.
    * Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96)
    * BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102)
    
    ## How was this patch tested?
    
    Existing PySpark unit tests + the unit tests from the cloudpickle project on their own.
    
    Author: Holden Karau <holden@us.ibm.com>
    Author: Kyle Kelley <rgbkrk@gmail.com>
    
    Closes #18734 from holdenk/holden-rgbkrk-cloudpickle-upgrades.