Skip to content
Snippets Groups Projects
  1. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  2. Nov 04, 2015
    • jerryshao's avatar
      [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) · 8aff36e9
      jerryshao authored
      This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look at this PR, thanks a lot.
      
      For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in `PATH`, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.
      
      Instead of using `readlink -f` like what this PR (https://github.com/apache/spark/pull/2386) implemented is that `-f` is not support for Mac, so here manually seeking the path through loop.
      
      I've tested with Mac and Linux (Cent OS), looks fine.
      
      This PR did not fix the scripts under `sbin` folder, not sure if it needs to be fixed also?
      
      Please help to review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      Author: Shay Rojansky <roji@roji.org>
      
      Closes #8669 from jerryshao/SPARK-2960.
      8aff36e9
  3. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
  4. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  5. Mar 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
  6. Jan 09, 2015
    • WangTaoTheTonic's avatar
      [SPARK-4990][Deploy]to find default properties file, search SPARK_CONF_DIR first · 8782eb99
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-4990
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #3823 from WangTaoTheTonic/SPARK-4990 and squashes the following commits:
      
      133c43e [WangTao] Update spark-submit2.cmd
      b1ab402 [WangTao] Update spark-submit
      4cc7f34 [WangTaoTheTonic] rebase
      55300bc [WangTaoTheTonic] use export to make it global
      d8d3cb7 [WangTaoTheTonic] remove blank line
      07b9ebf [WangTaoTheTonic] check SPARK_CONF_DIR instead of checking properties file
      c5a85eb [WangTaoTheTonic] to find default properties file, search SPARK_CONF_DIR first
      8782eb99
  7. Jan 08, 2015
  8. Nov 18, 2014
    • Davies Liu's avatar
      [SPARK-4017] show progress bar in console · e34f38ff
      Davies Liu authored
      The progress bar will look like this:
      
      ![1___spark_job__85_250_finished__4_are_running___java_](https://cloud.githubusercontent.com/assets/40902/4854813/a02f44ac-6099-11e4-9060-7c73a73151d6.png)
      
      In the right corner, the numbers are: finished tasks, running tasks, total tasks.
      
      After the stage has finished, it will disappear.
      
      The progress bar is only showed if logging level is WARN or higher (but progress in title is still showed), it can be turned off by spark.driver.showConsoleProgress.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3029 from davies/progress and squashes the following commits:
      
      95336d5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress
      fc49ac8 [Davies Liu] address commentse
      2e90f75 [Davies Liu] show multiple stages in same time
      0081bcc [Davies Liu] address comments
      38c42f1 [Davies Liu] fix tests
      ab87958 [Davies Liu] disable progress bar during tests
      30ac852 [Davies Liu] re-implement progress bar
      b3f34e5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress
      6fd30ff [Davies Liu] show progress bar if no task finished in 500ms
      e4e7344 [Davies Liu] refactor
      e1f524d [Davies Liu] revert unnecessary change
      a60477c [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress
      5cae3f2 [Davies Liu] fix style
      ea49fe0 [Davies Liu] address comments
      bc53d99 [Davies Liu] refactor
      e6bb189 [Davies Liu] fix logging in sparkshell
      7e7d4e7 [Davies Liu] address commments
      5df26bb [Davies Liu] fix style
      9e42208 [Davies Liu] show progress bar in console and title
      e34f38ff
  9. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  10. Aug 27, 2014
    • Andrew Or's avatar
      [SPARK-3167] Handle special driver configs in Windows · 7557c4cf
      Andrew Or authored
      This is an effort to bring the Windows scripts up to speed after recent splashing changes in #1845.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2129 from andrewor14/windows-config and squashes the following commits:
      
      881a8f0 [Andrew Or] Add reference to Windows taskkill
      92e6047 [Andrew Or] Update a few comments (minor)
      22b1acd [Andrew Or] Fix style again (minor)
      afcffea [Andrew Or] Fix style (minor)
      72004c2 [Andrew Or] Actually respect --driver-java-options
      803218b [Andrew Or] Actually respect SPARK_*_CLASSPATH
      eeb34a0 [Andrew Or] Update outdated comment (minor)
      35caecc [Andrew Or] In Windows, actually kill Java processes on exit
      f97daa2 [Andrew Or] Fix Windows spark shell stdin issue
      83ebe60 [Andrew Or] Parse special driver configs in Windows (broken)
      7557c4cf
  11. Aug 20, 2014
    • Andrew Or's avatar
      [SPARK-2849] Handle driver configs separately in client mode · b3ec51bf
      Andrew Or authored
      In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
      ```
      spark.driver.memory
      spark.driver.extraJavaOptions
      spark.driver.extraClassPath
      spark.driver.extraLibraryPath
      ```
      This PR handles these properties before launching the driver JVM. It achieves this by spawning a separate JVM that runs a new class called `SparkSubmitDriverBootstrapper`, which spawns `SparkSubmit` as a sub-process with the appropriate classpath, library paths, java opts and memory.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1845 from andrewor14/handle-configs-bash and squashes the following commits:
      
      bed4bdf [Andrew Or] Change a few comments / messages (minor)
      24dba60 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      08fd788 [Andrew Or] Warn against external usages of SparkSubmitDriverBootstrapper
      ff34728 [Andrew Or] Minor comments
      51aeb01 [Andrew Or] Filter out JVM memory in Scala rather than Bash (minor)
      9a778f6 [Andrew Or] Fix PySpark: actually kill driver on termination
      d0f20db [Andrew Or] Don't pass empty library paths, classpath, java opts etc.
      a78cb26 [Andrew Or] Revert a few changes in utils.sh (minor)
      9ba37e2 [Andrew Or] Don't barf when the properties file does not exist
      8867a09 [Andrew Or] A few more naming things (minor)
      19464ad [Andrew Or] SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
      d6488f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      1ea6bbe [Andrew Or] SparkClassLauncher -> SparkSubmitDriverBootstrapper
      a91ea19 [Andrew Or] Fix precedence of library paths, classpath, java opts and memory
      158f813 [Andrew Or] Remove "client mode" boolean argument
      c84f5c8 [Andrew Or] Remove debug print statement (minor)
      b71f52b [Andrew Or] Revert a few more changes (minor)
      7d94a8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      3a8235d [Andrew Or] Only parse the properties file if special configs exist
      c37e08d [Andrew Or] Revert a few more changes
      a396eda [Andrew Or] Nullify my own hard work to simplify bash
      0effa1e [Andrew Or] Add code in Scala that handles special configs
      c886568 [Andrew Or] Fix lines too long + a few comments / style (minor)
      7a4190a [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      7396be2 [Andrew Or] Explicitly comment that multi-line properties are not supported
      fa11ef8 [Andrew Or] Parse the properties file only if the special configs exist
      371cac4 [Andrew Or] Add function prefix (minor)
      be99eb3 [Andrew Or] Fix tests to not include multi-line configs
      bd0d468 [Andrew Or] Simplify parsing config file by ignoring multi-line arguments
      56ac247 [Andrew Or] Use eval and set to simplify splitting
      8d4614c [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      aeb79c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      2732ac0 [Andrew Or] Integrate BASH tests into dev/run-tests + log error properly
      8d26a5c [Andrew Or] Add tests for bash/utils.sh
      4ae24c3 [Andrew Or] Fix bug: escape properly in quote_java_property
      b3c4cd5 [Andrew Or] Fix bug: count the number of quotes instead of detecting presence
      c2273fc [Andrew Or] Fix typo (minor)
      e793e5f [Andrew Or] Handle multi-line arguments
      5d8f8c4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      c7b9926 [Andrew Or] Minor changes to spark-defaults.conf.template
      a992ae2 [Andrew Or] Escape spark.*.extraJavaOptions correctly
      aabfc7e [Andrew Or] escape -> split (minor)
      45a1eb9 [Andrew Or] Fix bug: escape escaped backslashes and quotes properly...
      1cdc6b1 [Andrew Or] Fix bug: escape escaped double quotes properly
      c854859 [Andrew Or] Add small comment
      c13a2cb [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      8e552b7 [Andrew Or] Include an example of spark.*.extraJavaOptions
      de765c9 [Andrew Or] Print spark-class command properly
      a4df3c4 [Andrew Or] Move parsing and escaping logic to utils.sh
      dec2343 [Andrew Or] Only export variables if they exist
      fa2136e [Andrew Or] Escape Java options + parse java properties files properly
      ef12f74 [Andrew Or] Minor formatting
      4ec22a1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      e5cfb46 [Andrew Or] Collapse duplicate code + fix potential whitespace issues
      4edcaa8 [Andrew Or] Redirect stdout to stderr for python
      130f295 [Andrew Or] Handle spark.driver.memory too
      98dd8e3 [Andrew Or] Add warning if properties file does not exist
      8843562 [Andrew Or] Fix compilation issues...
      75ee6b4 [Andrew Or] Remove accidentally added file
      63ed2e9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      0025474 [Andrew Or] Revert SparkSubmit handling of --driver-* options for only cluster mode
      a2ab1b0 [Andrew Or] Parse spark.driver.extra* in bash
      250cb95 [Andrew Or] Do not ignore spark.driver.extra* for client mode
      b3ec51bf
  12. May 25, 2014
    • Colin Patrick Mccabe's avatar
      spark-submit: add exec at the end of the script · 6e9fb632
      Colin Patrick Mccabe authored
      Add an 'exec' at the end of the spark-submit script, to avoid keeping a
      bash process hanging around while it runs.  This makes ps look a little
      bit nicer.
      
      Author: Colin Patrick Mccabe <cmccabe@cloudera.com>
      
      Closes #858 from cmccabe/SPARK-1907 and squashes the following commits:
      
      7023b64 [Colin Patrick Mccabe] spark-submit: add exec at the end of the script
      6e9fb632
  13. May 11, 2014
    • Patrick Wendell's avatar
      SPARK-1652: Set driver memory correctly in spark-submit. · 05c9aa9e
      Patrick Wendell authored
      The previous check didn't account for the fact that the default
      deploy mode is "client" unless otherwise specified. Also, this
      sets the more narrowly defined SPARK_DRIVER_MEMORY instead of setting
      SPARK_MEM.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #730 from pwendell/spark-submit and squashes the following commits:
      
      430b98f [Patrick Wendell] Feedback from Aaron
      e788edf [Patrick Wendell] Changes based on Aaron's feedback
      f508146 [Patrick Wendell] SPARK-1652: Set driver memory correctly in spark-submit.
      05c9aa9e
  14. May 01, 2014
    • Patrick Wendell's avatar
      SPARK-1691: Support quoted arguments inside of spark-submit. · 98b65593
      Patrick Wendell authored
      This is a fairly straightforward fix. The bug was reported by @vanzin and the fix was proposed by @deanwampler and myself. Please take a look!
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #609 from pwendell/quotes and squashes the following commits:
      
      8bed767 [Patrick Wendell] SPARK-1691: Support quoted arguments inside of spark-submit.
      98b65593
  15. Apr 28, 2014
    • Patrick Wendell's avatar
      SPARK-1654 and SPARK-1653: Fixes in spark-submit. · 949e3931
      Patrick Wendell authored
      Deals with two issues:
      1. Spark shell didn't correctly pass quoted arguments to spark-submit.
      ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
      2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
         which triggered warnings. Now we use new, more narrowly scoped,
         variables.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #576 from pwendell/spark-submit and squashes the following commits:
      
      67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
      949e3931
  16. Apr 25, 2014
    • Patrick Wendell's avatar
      SPARK-1619 Launch spark-shell with spark-submit · dc3b640a
      Patrick Wendell authored
      This simplifies the shell a bunch and passes all arguments through to spark-submit.
      
      There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #542 from pwendell/spark-shell and squashes the following commits:
      
      9eb3e6f [Patrick Wendell] Updating Spark docs
      b552459 [Patrick Wendell] Andrew's feedback
      97720fa [Patrick Wendell] Review feedback
      aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
      dc3b640a
  17. Apr 21, 2014
    • Patrick Wendell's avatar
      Clean up and simplify Spark configuration · fb98488f
      Patrick Wendell authored
      Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
      
      1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
      2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
      3. Adds ability to set these same variables for the driver using `spark-submit`.
      4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
      5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #299 from pwendell/config-cleanup and squashes the following commits:
      
      127f301 [Patrick Wendell] Improvements to testing
      a006464 [Patrick Wendell] Moving properties file template.
      b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
      0086939 [Patrick Wendell] Minor style fixes
      af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
      b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
      af0adf7 [Patrick Wendell] Automatically add user jar
      a56b125 [Patrick Wendell] Responses to Tom's review
      d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a762901 [Patrick Wendell] Fixing test failures
      ffa00fe [Patrick Wendell] Review feedback
      fda0301 [Patrick Wendell] Note
      308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
      e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
      be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
      c2a2909 [Patrick Wendell] Test compile fixes
      4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
      afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
      b08893b [Patrick Wendell] Additional improvements.
      ace4ead [Patrick Wendell] Responses to review feedback.
      b72d183 [Patrick Wendell] Review feedback for spark env file
      46555c1 [Patrick Wendell] Review feedback and import clean-ups
      437aed1 [Patrick Wendell] Small fix
      761ebcd [Patrick Wendell] Library path and classpath for drivers
      7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
      5b0ba8e [Patrick Wendell] Don't ship executor envs
      84cc5e5 [Patrick Wendell] Small clean-up
      1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
      4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
      6eaf7d0 [Patrick Wendell] executorJavaOpts
      0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
      ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
      fb98488f
  18. Mar 29, 2014
    • Sandy Ryza's avatar
      SPARK-1126. spark-app preliminary · 16178160
      Sandy Ryza authored
      This is a starting version of the spark-app script for running compiled binaries against Spark.  It still needs tests and some polish.  The only testing I've done so far has been using it to launch jobs in yarn-standalone mode against a pseudo-distributed cluster.
      
      This leaves out the changes required for launching python scripts.  I think it might be best to save those for another JIRA/PR (while keeping to the design so that they won't require backwards-incompatible changes).
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #86 from sryza/sandy-spark-1126 and squashes the following commits:
      
      d428d85 [Sandy Ryza] Commenting, doc, and import fixes from Patrick's comments
      e7315c6 [Sandy Ryza] Fix failing tests
      34de899 [Sandy Ryza] Change --more-jars to --jars and fix docs
      299ddca [Sandy Ryza] Fix scalastyle
      a94c627 [Sandy Ryza] Add newline at end of SparkSubmit
      04bc4e2 [Sandy Ryza] SPARK-1126. spark-submit script
      16178160
  19. Sep 22, 2013
  20. Jul 16, 2013
  21. Mar 26, 2013
  22. Aug 01, 2012
Loading