Skip to content
Snippets Groups Projects
  • Davies Liu's avatar
    8767565c
    [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect() · 8767565c
    Davies Liu authored
    Because circular reference between JavaObject and JavaMember, an Java object can not be released until Python GC kick in, then it will cause memory leak in collect(), which may consume lots of memory in JVM.
    
    This PR change the way we sending collected data back into Python from local file to socket, which could avoid any disk IO during collect, also avoid any referrers of Java object in Python.
    
    cc JoshRosen
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4923 from davies/fix_collect and squashes the following commits:
    
    d730286 [Davies Liu] address comments
    24c92a4 [Davies Liu] fix style
    ba54614 [Davies Liu] use socket to transfer data from JVM
    9517c8f [Davies Liu] fix memory leak in collect()
    8767565c
    History
    [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()
    Davies Liu authored
    Because circular reference between JavaObject and JavaMember, an Java object can not be released until Python GC kick in, then it will cause memory leak in collect(), which may consume lots of memory in JVM.
    
    This PR change the way we sending collected data back into Python from local file to socket, which could avoid any disk IO during collect, also avoid any referrers of Java object in Python.
    
    cc JoshRosen
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4923 from davies/fix_collect and squashes the following commits:
    
    d730286 [Davies Liu] address comments
    24c92a4 [Davies Liu] fix style
    ba54614 [Davies Liu] use socket to transfer data from JVM
    9517c8f [Davies Liu] fix memory leak in collect()