Skip to content
Snippets Groups Projects
  • Andrew Or's avatar
    b3ec51bf
    [SPARK-2849] Handle driver configs separately in client mode · b3ec51bf
    Andrew Or authored
    In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
    ```
    spark.driver.memory
    spark.driver.extraJavaOptions
    spark.driver.extraClassPath
    spark.driver.extraLibraryPath
    ```
    This PR handles these properties before launching the driver JVM. It achieves this by spawning a separate JVM that runs a new class called `SparkSubmitDriverBootstrapper`, which spawns `SparkSubmit` as a sub-process with the appropriate classpath, library paths, java opts and memory.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #1845 from andrewor14/handle-configs-bash and squashes the following commits:
    
    bed4bdf [Andrew Or] Change a few comments / messages (minor)
    24dba60 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    08fd788 [Andrew Or] Warn against external usages of SparkSubmitDriverBootstrapper
    ff34728 [Andrew Or] Minor comments
    51aeb01 [Andrew Or] Filter out JVM memory in Scala rather than Bash (minor)
    9a778f6 [Andrew Or] Fix PySpark: actually kill driver on termination
    d0f20db [Andrew Or] Don't pass empty library paths, classpath, java opts etc.
    a78cb26 [Andrew Or] Revert a few changes in utils.sh (minor)
    9ba37e2 [Andrew Or] Don't barf when the properties file does not exist
    8867a09 [Andrew Or] A few more naming things (minor)
    19464ad [Andrew Or] SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
    d6488f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    1ea6bbe [Andrew Or] SparkClassLauncher -> SparkSubmitDriverBootstrapper
    a91ea19 [Andrew Or] Fix precedence of library paths, classpath, java opts and memory
    158f813 [Andrew Or] Remove "client mode" boolean argument
    c84f5c8 [Andrew Or] Remove debug print statement (minor)
    b71f52b [Andrew Or] Revert a few more changes (minor)
    7d94a8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    3a8235d [Andrew Or] Only parse the properties file if special configs exist
    c37e08d [Andrew Or] Revert a few more changes
    a396eda [Andrew Or] Nullify my own hard work to simplify bash
    0effa1e [Andrew Or] Add code in Scala that handles special configs
    c886568 [Andrew Or] Fix lines too long + a few comments / style (minor)
    7a4190a [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    7396be2 [Andrew Or] Explicitly comment that multi-line properties are not supported
    fa11ef8 [Andrew Or] Parse the properties file only if the special configs exist
    371cac4 [Andrew Or] Add function prefix (minor)
    be99eb3 [Andrew Or] Fix tests to not include multi-line configs
    bd0d468 [Andrew Or] Simplify parsing config file by ignoring multi-line arguments
    56ac247 [Andrew Or] Use eval and set to simplify splitting
    8d4614c [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    aeb79c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    2732ac0 [Andrew Or] Integrate BASH tests into dev/run-tests + log error properly
    8d26a5c [Andrew Or] Add tests for bash/utils.sh
    4ae24c3 [Andrew Or] Fix bug: escape properly in quote_java_property
    b3c4cd5 [Andrew Or] Fix bug: count the number of quotes instead of detecting presence
    c2273fc [Andrew Or] Fix typo (minor)
    e793e5f [Andrew Or] Handle multi-line arguments
    5d8f8c4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    c7b9926 [Andrew Or] Minor changes to spark-defaults.conf.template
    a992ae2 [Andrew Or] Escape spark.*.extraJavaOptions correctly
    aabfc7e [Andrew Or] escape -> split (minor)
    45a1eb9 [Andrew Or] Fix bug: escape escaped backslashes and quotes properly...
    1cdc6b1 [Andrew Or] Fix bug: escape escaped double quotes properly
    c854859 [Andrew Or] Add small comment
    c13a2cb [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    8e552b7 [Andrew Or] Include an example of spark.*.extraJavaOptions
    de765c9 [Andrew Or] Print spark-class command properly
    a4df3c4 [Andrew Or] Move parsing and escaping logic to utils.sh
    dec2343 [Andrew Or] Only export variables if they exist
    fa2136e [Andrew Or] Escape Java options + parse java properties files properly
    ef12f74 [Andrew Or] Minor formatting
    4ec22a1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    e5cfb46 [Andrew Or] Collapse duplicate code + fix potential whitespace issues
    4edcaa8 [Andrew Or] Redirect stdout to stderr for python
    130f295 [Andrew Or] Handle spark.driver.memory too
    98dd8e3 [Andrew Or] Add warning if properties file does not exist
    8843562 [Andrew Or] Fix compilation issues...
    75ee6b4 [Andrew Or] Remove accidentally added file
    63ed2e9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    0025474 [Andrew Or] Revert SparkSubmit handling of --driver-* options for only cluster mode
    a2ab1b0 [Andrew Or] Parse spark.driver.extra* in bash
    250cb95 [Andrew Or] Do not ignore spark.driver.extra* for client mode
    b3ec51bf
    History
    [SPARK-2849] Handle driver configs separately in client mode
    Andrew Or authored
    In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
    ```
    spark.driver.memory
    spark.driver.extraJavaOptions
    spark.driver.extraClassPath
    spark.driver.extraLibraryPath
    ```
    This PR handles these properties before launching the driver JVM. It achieves this by spawning a separate JVM that runs a new class called `SparkSubmitDriverBootstrapper`, which spawns `SparkSubmit` as a sub-process with the appropriate classpath, library paths, java opts and memory.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #1845 from andrewor14/handle-configs-bash and squashes the following commits:
    
    bed4bdf [Andrew Or] Change a few comments / messages (minor)
    24dba60 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    08fd788 [Andrew Or] Warn against external usages of SparkSubmitDriverBootstrapper
    ff34728 [Andrew Or] Minor comments
    51aeb01 [Andrew Or] Filter out JVM memory in Scala rather than Bash (minor)
    9a778f6 [Andrew Or] Fix PySpark: actually kill driver on termination
    d0f20db [Andrew Or] Don't pass empty library paths, classpath, java opts etc.
    a78cb26 [Andrew Or] Revert a few changes in utils.sh (minor)
    9ba37e2 [Andrew Or] Don't barf when the properties file does not exist
    8867a09 [Andrew Or] A few more naming things (minor)
    19464ad [Andrew Or] SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
    d6488f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    1ea6bbe [Andrew Or] SparkClassLauncher -> SparkSubmitDriverBootstrapper
    a91ea19 [Andrew Or] Fix precedence of library paths, classpath, java opts and memory
    158f813 [Andrew Or] Remove "client mode" boolean argument
    c84f5c8 [Andrew Or] Remove debug print statement (minor)
    b71f52b [Andrew Or] Revert a few more changes (minor)
    7d94a8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    3a8235d [Andrew Or] Only parse the properties file if special configs exist
    c37e08d [Andrew Or] Revert a few more changes
    a396eda [Andrew Or] Nullify my own hard work to simplify bash
    0effa1e [Andrew Or] Add code in Scala that handles special configs
    c886568 [Andrew Or] Fix lines too long + a few comments / style (minor)
    7a4190a [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    7396be2 [Andrew Or] Explicitly comment that multi-line properties are not supported
    fa11ef8 [Andrew Or] Parse the properties file only if the special configs exist
    371cac4 [Andrew Or] Add function prefix (minor)
    be99eb3 [Andrew Or] Fix tests to not include multi-line configs
    bd0d468 [Andrew Or] Simplify parsing config file by ignoring multi-line arguments
    56ac247 [Andrew Or] Use eval and set to simplify splitting
    8d4614c [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    aeb79c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
    2732ac0 [Andrew Or] Integrate BASH tests into dev/run-tests + log error properly
    8d26a5c [Andrew Or] Add tests for bash/utils.sh
    4ae24c3 [Andrew Or] Fix bug: escape properly in quote_java_property
    b3c4cd5 [Andrew Or] Fix bug: count the number of quotes instead of detecting presence
    c2273fc [Andrew Or] Fix typo (minor)
    e793e5f [Andrew Or] Handle multi-line arguments
    5d8f8c4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    c7b9926 [Andrew Or] Minor changes to spark-defaults.conf.template
    a992ae2 [Andrew Or] Escape spark.*.extraJavaOptions correctly
    aabfc7e [Andrew Or] escape -> split (minor)
    45a1eb9 [Andrew Or] Fix bug: escape escaped backslashes and quotes properly...
    1cdc6b1 [Andrew Or] Fix bug: escape escaped double quotes properly
    c854859 [Andrew Or] Add small comment
    c13a2cb [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    8e552b7 [Andrew Or] Include an example of spark.*.extraJavaOptions
    de765c9 [Andrew Or] Print spark-class command properly
    a4df3c4 [Andrew Or] Move parsing and escaping logic to utils.sh
    dec2343 [Andrew Or] Only export variables if they exist
    fa2136e [Andrew Or] Escape Java options + parse java properties files properly
    ef12f74 [Andrew Or] Minor formatting
    4ec22a1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    e5cfb46 [Andrew Or] Collapse duplicate code + fix potential whitespace issues
    4edcaa8 [Andrew Or] Redirect stdout to stderr for python
    130f295 [Andrew Or] Handle spark.driver.memory too
    98dd8e3 [Andrew Or] Add warning if properties file does not exist
    8843562 [Andrew Or] Fix compilation issues...
    75ee6b4 [Andrew Or] Remove accidentally added file
    63ed2e9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
    0025474 [Andrew Or] Revert SparkSubmit handling of --driver-* options for only cluster mode
    a2ab1b0 [Andrew Or] Parse spark.driver.extra* in bash
    250cb95 [Andrew Or] Do not ignore spark.driver.extra* for client mode