Skip to content
Snippets Groups Projects
  • Ilya Ganelin's avatar
    4bdfb7ba
    [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Added documentation explaining shuffle · 4bdfb7ba
    Ilya Ganelin authored
    I've updated the Spark Programming Guide to add a section on the shuffle operation providing some background on what it does. I've also addressed some of its performance impacts.
    
    I've included documentation to address the following issues:
    https://issues.apache.org/jira/browse/SPARK-5836
    https://issues.apache.org/jira/browse/SPARK-3441
    https://issues.apache.org/jira/browse/SPARK-5750
    
    https://issues.apache.org/jira/browse/SPARK-4227 is related but can be addressed in a separate PR since it involves updates to the Spark Configuration Guide.
    
    Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
    Author: Ilya Ganelin <ilganeli@gmail.com>
    
    Closes #5074 from ilganeli/SPARK-5750 and squashes the following commits:
    
    6178e24 [Ilya Ganelin] Update programming-guide.md
    7a0b96f [Ilya Ganelin] Update programming-guide.md
    2c5df08 [Ilya Ganelin] Merge branch 'SPARK-5750' of github.com:ilganeli/spark into SPARK-5750
    dffbd2d [Ilya Ganelin] [SPARK-5750] Slight wording update
    1ff4eb4 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5750
    85f9c6e [Ilya Ganelin] Update programming-guide.md
    349d1fa [Ilya Ganelin] Added cross linkf or configuration page
    eeb5a7a [Ilya Ganelin] [SPARK-5750] Added some minor fixes
    dd5cc9d [Ilya Ganelin] [SPARK-5750] Fixed some factual inaccuracies with regards to shuffle internals.
    a8adb57 [Ilya Ganelin] [SPARK-5750] Incoporated feedback from Sean Owen
    9954bbe [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5750
    159dd1c [Ilya Ganelin] [SPARK-5750] Style fixes from rxin.
    75ef67b [Ilya Ganelin] [SPARK-5750][SPARK-3441][SPARK-5836] Added documentation explaining the shuffle operation and included errata from a number of other JIRAs
    4bdfb7ba
    History
    [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Added documentation explaining shuffle
    Ilya Ganelin authored
    I've updated the Spark Programming Guide to add a section on the shuffle operation providing some background on what it does. I've also addressed some of its performance impacts.
    
    I've included documentation to address the following issues:
    https://issues.apache.org/jira/browse/SPARK-5836
    https://issues.apache.org/jira/browse/SPARK-3441
    https://issues.apache.org/jira/browse/SPARK-5750
    
    https://issues.apache.org/jira/browse/SPARK-4227 is related but can be addressed in a separate PR since it involves updates to the Spark Configuration Guide.
    
    Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
    Author: Ilya Ganelin <ilganeli@gmail.com>
    
    Closes #5074 from ilganeli/SPARK-5750 and squashes the following commits:
    
    6178e24 [Ilya Ganelin] Update programming-guide.md
    7a0b96f [Ilya Ganelin] Update programming-guide.md
    2c5df08 [Ilya Ganelin] Merge branch 'SPARK-5750' of github.com:ilganeli/spark into SPARK-5750
    dffbd2d [Ilya Ganelin] [SPARK-5750] Slight wording update
    1ff4eb4 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5750
    85f9c6e [Ilya Ganelin] Update programming-guide.md
    349d1fa [Ilya Ganelin] Added cross linkf or configuration page
    eeb5a7a [Ilya Ganelin] [SPARK-5750] Added some minor fixes
    dd5cc9d [Ilya Ganelin] [SPARK-5750] Fixed some factual inaccuracies with regards to shuffle internals.
    a8adb57 [Ilya Ganelin] [SPARK-5750] Incoporated feedback from Sean Owen
    9954bbe [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5750
    159dd1c [Ilya Ganelin] [SPARK-5750] Style fixes from rxin.
    75ef67b [Ilya Ganelin] [SPARK-5750][SPARK-3441][SPARK-5836] Added documentation explaining the shuffle operation and included errata from a number of other JIRAs