Skip to content
  • Josh Rosen's avatar
    83b7a1c6
    [SPARK-4019] [SPARK-3740] Fix MapStatus compression bug that could lead to... · 83b7a1c6
    Josh Rosen authored
    [SPARK-4019] [SPARK-3740] Fix MapStatus compression bug that could lead to empty results or Snappy errors
    
    This commit fixes a bug in MapStatus that could cause jobs to wrongly return
    empty results if those jobs contained stages with more than 2000 partitions
    where most of those partitions were empty.
    
    For jobs with > 2000 partitions, MapStatus uses HighlyCompressedMapStatus,
    which only stores the average size of blocks.  If the average block size is
    zero, then this will cause all blocks to be reported as empty, causing
    BlockFetcherIterator to mistakenly skip them.
    
    For example, this would return an empty result:
    
        sc.makeRDD(0 until 10, 1000).repartition(2001).collect()
    
    This can also lead to deserialization errors (e.g. Snappy decoding errors)
    for jobs with > 2000 partitions where the average block size is non-zero but
    there is at least one empty block.  In this case, the BlockFetcher attempts to
    fetch empty blocks and fails when trying to deserialize them.
    
    The root problem here is that MapStatus has a (previously undocumented)
    correctness property that was violated by HighlyCompressedMapStatus:
    
        If a block is non-empty, then getSizeForBlock must be non-zero.
    
    I fixed this by modifying HighlyCompressedMapStatus to store the average size
    of _non-empty_ blocks and to use a compressed bitmap to track which blocks are
    empty.
    
    I also removed a test which was broken as originally written: it attempted
    to check that HighlyCompressedMapStatus's size estimation error was < 10%,
    but this was broken because HighlyCompressedMapStatus is only used for map
    statuses with > 2000 partitions, but the test only created 50.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #2866 from JoshRosen/spark-4019 and squashes the following commits:
    
    fc8b490 [Josh Rosen] Roll back hashset change, which didn't improve performance.
    5faa0a4 [Josh Rosen] Incorporate review feedback
    c8b8cae [Josh Rosen] Two performance fixes:
    3b892dd [Josh Rosen] Address Reynold's review comments
    ba2e71c [Josh Rosen] Add missing newline
    609407d [Josh Rosen] Use Roaring Bitmap to track non-empty blocks.
    c23897a [Josh Rosen] Use sets when comparing collect() results
    91276a3 [Josh Rosen] [SPARK-4019] Fix MapStatus compression bug that could lead to empty results.
    83b7a1c6
    [SPARK-4019] [SPARK-3740] Fix MapStatus compression bug that could lead to...
    Josh Rosen authored
    [SPARK-4019] [SPARK-3740] Fix MapStatus compression bug that could lead to empty results or Snappy errors
    
    This commit fixes a bug in MapStatus that could cause jobs to wrongly return
    empty results if those jobs contained stages with more than 2000 partitions
    where most of those partitions were empty.
    
    For jobs with > 2000 partitions, MapStatus uses HighlyCompressedMapStatus,
    which only stores the average size of blocks.  If the average block size is
    zero, then this will cause all blocks to be reported as empty, causing
    BlockFetcherIterator to mistakenly skip them.
    
    For example, this would return an empty result:
    
        sc.makeRDD(0 until 10, 1000).repartition(2001).collect()
    
    This can also lead to deserialization errors (e.g. Snappy decoding errors)
    for jobs with > 2000 partitions where the average block size is non-zero but
    there is at least one empty block.  In this case, the BlockFetcher attempts to
    fetch empty blocks and fails when trying to deserialize them.
    
    The root problem here is that MapStatus has a (previously undocumented)
    correctness property that was violated by HighlyCompressedMapStatus:
    
        If a block is non-empty, then getSizeForBlock must be non-zero.
    
    I fixed this by modifying HighlyCompressedMapStatus to store the average size
    of _non-empty_ blocks and to use a compressed bitmap to track which blocks are
    empty.
    
    I also removed a test which was broken as originally written: it attempted
    to check that HighlyCompressedMapStatus's size estimation error was < 10%,
    but this was broken because HighlyCompressedMapStatus is only used for map
    statuses with > 2000 partitions, but the test only created 50.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #2866 from JoshRosen/spark-4019 and squashes the following commits:
    
    fc8b490 [Josh Rosen] Roll back hashset change, which didn't improve performance.
    5faa0a4 [Josh Rosen] Incorporate review feedback
    c8b8cae [Josh Rosen] Two performance fixes:
    3b892dd [Josh Rosen] Address Reynold's review comments
    ba2e71c [Josh Rosen] Add missing newline
    609407d [Josh Rosen] Use Roaring Bitmap to track non-empty blocks.
    c23897a [Josh Rosen] Use sets when comparing collect() results
    91276a3 [Josh Rosen] [SPARK-4019] Fix MapStatus compression bug that could lead to empty results.
Loading