Skip to content
Snippets Groups Projects
  1. Nov 10, 2013
  2. Nov 09, 2013
  3. Nov 08, 2013
  4. Nov 07, 2013
  5. Nov 06, 2013
  6. Nov 05, 2013
  7. Nov 04, 2013
    • Reynold Xin's avatar
      Merge pull request #139 from aarondav/shuffle-next · 81065321
      Reynold Xin authored
      Never store shuffle blocks in BlockManager
      
      After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use
      within BlockManager (they had a no-arg constructor!). This patch completely eliminates
      them, saving us around 100-150 bytes per shuffle block.
      The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding
      state saved by the MapOutputTracker.
      
      Note: This should *not* be merged directly into 0.8.0 -- see #138
      81065321
    • Aaron Davidson's avatar
      Never store shuffle blocks in BlockManager · 93c90844
      Aaron Davidson authored
      After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use
      within BlockManager (they had a no-arg constructor!). This patch completely eliminates
      them, saving us around 100-150 bytes per shuffle block.
      The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding
      state saved by the MapOutputTracker.
      93c90844
    • Reynold Xin's avatar
      Merge pull request #128 from shimingfei/joblogger-doc · 0b26a392
      Reynold Xin authored
      
      add javadoc to JobLogger, and some small fix
      
      against Spark-941
      
      add javadoc to JobLogger, output more info for RDD, modify recordStageDepGraph to avoid output duplicate stage dependency information
      
      (cherry picked from commit 518cf22e)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      0b26a392
    • Reynold Xin's avatar
      Merge pull request #130 from aarondav/shuffle · 7a26104a
      Reynold Xin authored
      Memory-optimized shuffle file consolidation
      
      Reduces overhead of each shuffle block for consolidation from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes.
      
      Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation.
      
      This is accomplished by replacing the map from ShuffleBlockId to FileSegment (i.e., block id to where it's located), which had high overhead due to being a gigantic, timestamped, concurrent map with a more space-efficient structure. Namely, the following are introduced (I have omitted the word "Shuffle" from some names for clarity):
      **ShuffleFile** - there is one ShuffleFile per consolidated shuffle file on disk. We store an array of offsets into the physical shuffle file for each ShuffleMapTask that wrote into the file. This is sufficient to reconstruct FileSegments for mappers that are in the file.
      **FileGroup** - contains a set of ShuffleFiles, one per reducer, that a MapTask can use to write its output. There is one FileGroup created per _concurrent_ MapTask. The FileGroup contains an array of the mapIds that have been written to all files in the group. The positions of elements in this array map directly onto the positions in each ShuffleFile's offsets array.
      
      In order to locate the FileSegment associated with a BlockId, we have another structure which maps each reducer to the set of ShuffleFiles that were created for it. (There will be as many ShuffleFiles per reducer as there are FileGroups.) To lookup a given ShuffleBlockId (shuffleId, reducerId, mapId), we thus search through all ShuffleFiles associated with that reducer.
      
      As a time optimization, we ensure that FileGroups are only reused for MapTasks with monotonically increasing mapIds. This allows us to perform a binary search to locate a mapId inside a group, and also enables potential future optimization (based on the usual monotonic access order).
      7a26104a
    • Aaron Davidson's avatar
      Minor cleanup in ShuffleBlockManager · 1ba11b1c
      Aaron Davidson authored
      1ba11b1c
    • Aaron Davidson's avatar
      Refactor ShuffleBlockManager to reduce public interface · 6201e5e2
      Aaron Davidson authored
      - ShuffleBlocks has been removed and replaced by ShuffleWriterGroup.
      - ShuffleWriterGroup no longer contains a reference to a ShuffleFileGroup.
      - ShuffleFile has been removed and its contents are now within ShuffleFileGroup.
      - ShuffleBlockManager.forShuffle has been replaced by a more stateful forMapTask.
      6201e5e2
    • Aaron Davidson's avatar
      Add javadoc and remove unused code · b0cf19fe
      Aaron Davidson authored
      b0cf19fe
  8. Nov 03, 2013
Loading