Skip to content
Snippets Groups Projects
  • Andrew Or's avatar
    61a5cced
    [SPARK-3797] Run external shuffle service in Yarn NM · 61a5cced
    Andrew Or authored
    This creates a new module `network/yarn` that depends on `network/shuffle` recently created in #3001. This PR introduces a custom Yarn auxiliary service that runs the external shuffle service. As of the changes here this shuffle service is required for using dynamic allocation with Spark.
    
    This is still WIP mainly because it doesn't handle security yet. I have tested this on a stable Yarn cluster.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3082 from andrewor14/yarn-shuffle-service and squashes the following commits:
    
    ef3ddae [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    0ee67a2 [Andrew Or] Minor wording suggestions
    1c66046 [Andrew Or] Remove unused provided dependencies
    0eb6233 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    6489db5 [Andrew Or] Try catch at the right places
    7b71d8f [Andrew Or] Add detailed java docs + reword a few comments
    d1124e4 [Andrew Or] Add security to shuffle service (INCOMPLETE)
    5f8a96f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    9b6e058 [Andrew Or] Address various feedback
    f48b20c [Andrew Or] Fix tests again
    f39daa6 [Andrew Or] Do not make network-yarn an assembly module
    761f58a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    15a5b37 [Andrew Or] Fix build for Hadoop 1.x
    baff916 [Andrew Or] Fix tests
    5bf9b7e [Andrew Or] Address a few minor comments
    5b419b8 [Andrew Or] Add missing license header
    804e7ff [Andrew Or] Include the Yarn shuffle service jar in the distribution
    cd076a4 [Andrew Or] Require external shuffle service for dynamic allocation
    ea764e0 [Andrew Or] Connect to Yarn shuffle service only if it's enabled
    1bf5109 [Andrew Or] Use the shuffle service port specified through hadoop config
    b4b1f0c [Andrew Or] 4 tabs -> 2 tabs
    43dcb96 [Andrew Or] First cut integration of shuffle service with Yarn aux service
    b54a0c4 [Andrew Or] Initial skeleton for Yarn shuffle service
    61a5cced
    History
    [SPARK-3797] Run external shuffle service in Yarn NM
    Andrew Or authored
    This creates a new module `network/yarn` that depends on `network/shuffle` recently created in #3001. This PR introduces a custom Yarn auxiliary service that runs the external shuffle service. As of the changes here this shuffle service is required for using dynamic allocation with Spark.
    
    This is still WIP mainly because it doesn't handle security yet. I have tested this on a stable Yarn cluster.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3082 from andrewor14/yarn-shuffle-service and squashes the following commits:
    
    ef3ddae [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    0ee67a2 [Andrew Or] Minor wording suggestions
    1c66046 [Andrew Or] Remove unused provided dependencies
    0eb6233 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    6489db5 [Andrew Or] Try catch at the right places
    7b71d8f [Andrew Or] Add detailed java docs + reword a few comments
    d1124e4 [Andrew Or] Add security to shuffle service (INCOMPLETE)
    5f8a96f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    9b6e058 [Andrew Or] Address various feedback
    f48b20c [Andrew Or] Fix tests again
    f39daa6 [Andrew Or] Do not make network-yarn an assembly module
    761f58a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service
    15a5b37 [Andrew Or] Fix build for Hadoop 1.x
    baff916 [Andrew Or] Fix tests
    5bf9b7e [Andrew Or] Address a few minor comments
    5b419b8 [Andrew Or] Add missing license header
    804e7ff [Andrew Or] Include the Yarn shuffle service jar in the distribution
    cd076a4 [Andrew Or] Require external shuffle service for dynamic allocation
    ea764e0 [Andrew Or] Connect to Yarn shuffle service only if it's enabled
    1bf5109 [Andrew Or] Use the shuffle service port specified through hadoop config
    b4b1f0c [Andrew Or] 4 tabs -> 2 tabs
    43dcb96 [Andrew Or] First cut integration of shuffle service with Yarn aux service
    b54a0c4 [Andrew Or] Initial skeleton for Yarn shuffle service