Skip to content
  • Reynold Xin's avatar
    2e981b7b
    [SPARK-9531] [SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter · 2e981b7b
    Reynold Xin authored
    This pull request adds a destructAndCreateExternalSorter method to UnsafeFixedWidthAggregationMap. The new method does the following:
    
    1. Creates a new external sorter UnsafeKVExternalSorter
    2. Adds all the data into an in-memory sorter, sorts them
    3. Spills the sorted in-memory data to disk
    
    This method can be used to fallback to sort-based aggregation when under memory pressure.
    
    The pull request also includes accounting fixes from JoshRosen.
    
    TODOs (that can be done in follow-up PRs)
    - [x] Address Josh's feedbacks from #7849
    - [x] More documentation and test cases
    - [x] Make sure we are doing memory accounting correctly with test cases (e.g. did we release the memory in BytesToBytesMap twice?)
    - [ ] Look harder at possible memory leaks and exception handling
    - [ ] Randomized tester for the KV sorter as well as the aggregation map
    
    Author: Reynold Xin <rxin@databricks.com>
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #7860 from rxin/kvsorter and squashes the following commits:
    
    986a58c [Reynold Xin] Bug fix.
    599317c [Reynold Xin] Style fix and slightly more compact code.
    fe7bd4e [Reynold Xin] Bug fixes.
    fd71bef [Reynold Xin] Merge remote-tracking branch 'josh/large-records-in-sql-sorter' into kvsorter-with-josh-fix
    3efae38 [Reynold Xin] More fixes and documentation.
    45f1b09 [Josh Rosen] Ensure that spill files are cleaned up
    f6a9bd3 [Reynold Xin] Josh feedback.
    9be8139 [Reynold Xin] Remove testSpillFrequency.
    7cbe759 [Reynold Xin] [SPARK-9531][SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter.
    ae4a8af [Josh Rosen] Detect leaked unsafe memory in UnsafeExternalSorterSuite.
    52f9b06 [Josh Rosen] Detect ShuffleMemoryManager leaks in UnsafeExternalSorter.
    2e981b7b
    [SPARK-9531] [SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter
    Reynold Xin authored
    This pull request adds a destructAndCreateExternalSorter method to UnsafeFixedWidthAggregationMap. The new method does the following:
    
    1. Creates a new external sorter UnsafeKVExternalSorter
    2. Adds all the data into an in-memory sorter, sorts them
    3. Spills the sorted in-memory data to disk
    
    This method can be used to fallback to sort-based aggregation when under memory pressure.
    
    The pull request also includes accounting fixes from JoshRosen.
    
    TODOs (that can be done in follow-up PRs)
    - [x] Address Josh's feedbacks from #7849
    - [x] More documentation and test cases
    - [x] Make sure we are doing memory accounting correctly with test cases (e.g. did we release the memory in BytesToBytesMap twice?)
    - [ ] Look harder at possible memory leaks and exception handling
    - [ ] Randomized tester for the KV sorter as well as the aggregation map
    
    Author: Reynold Xin <rxin@databricks.com>
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #7860 from rxin/kvsorter and squashes the following commits:
    
    986a58c [Reynold Xin] Bug fix.
    599317c [Reynold Xin] Style fix and slightly more compact code.
    fe7bd4e [Reynold Xin] Bug fixes.
    fd71bef [Reynold Xin] Merge remote-tracking branch 'josh/large-records-in-sql-sorter' into kvsorter-with-josh-fix
    3efae38 [Reynold Xin] More fixes and documentation.
    45f1b09 [Josh Rosen] Ensure that spill files are cleaned up
    f6a9bd3 [Reynold Xin] Josh feedback.
    9be8139 [Reynold Xin] Remove testSpillFrequency.
    7cbe759 [Reynold Xin] [SPARK-9531][SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter.
    ae4a8af [Josh Rosen] Detect leaked unsafe memory in UnsafeExternalSorterSuite.
    52f9b06 [Josh Rosen] Detect ShuffleMemoryManager leaks in UnsafeExternalSorter.
Loading