Skip to content
  • Davies Liu's avatar
    9b200272
    [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark · 9b200272
    Davies Liu authored
    The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
    Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
    
    cc JoshRosen rxin angelini
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #6714 from davies/batch_size and squashes the following commits:
    
    b170dfb [Davies Liu] update test
    b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
    6ade745 [Davies Liu] update test
    5c21777 [Davies Liu] Update shuffle.py
    e746aec [Davies Liu] fix batch size during sort
    9b200272
    [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
    Davies Liu authored
    The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
    Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
    
    cc JoshRosen rxin angelini
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #6714 from davies/batch_size and squashes the following commits:
    
    b170dfb [Davies Liu] update test
    b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
    6ade745 [Davies Liu] update test
    5c21777 [Davies Liu] Update shuffle.py
    e746aec [Davies Liu] fix batch size during sort
Loading