Skip to content
Snippets Groups Projects
  • Davies Liu's avatar
    e0e64ba4
    [SPARK-6055] [PySpark] fix incorrect __eq__ of DataType · e0e64ba4
    Davies Liu authored
    The _eq_ of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released.
    
    Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython).
    
    This PR also improve the performance of inferSchema (avoid the unnecessary converter of object).
    
    cc pwendell  JoshRosen
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4808 from davies/leak and squashes the following commits:
    
    6a322a4 [Davies Liu] tests refactor
    3da44fc [Davies Liu] fix __eq__ of Singleton
    534ac90 [Davies Liu] add more checks
    46999dc [Davies Liu] fix tests
    d9ae973 [Davies Liu] fix memory leak in sql
    e0e64ba4
    History
    [SPARK-6055] [PySpark] fix incorrect __eq__ of DataType
    Davies Liu authored
    The _eq_ of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released.
    
    Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython).
    
    This PR also improve the performance of inferSchema (avoid the unnecessary converter of object).
    
    cc pwendell  JoshRosen
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4808 from davies/leak and squashes the following commits:
    
    6a322a4 [Davies Liu] tests refactor
    3da44fc [Davies Liu] fix __eq__ of Singleton
    534ac90 [Davies Liu] add more checks
    46999dc [Davies Liu] fix tests
    d9ae973 [Davies Liu] fix memory leak in sql