Skip to content
Snippets Groups Projects
  1. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  2. Nov 04, 2015
    • jerryshao's avatar
      [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) · 8aff36e9
      jerryshao authored
      This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look at this PR, thanks a lot.
      
      For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in `PATH`, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.
      
      Instead of using `readlink -f` like what this PR (https://github.com/apache/spark/pull/2386) implemented is that `-f` is not support for Mac, so here manually seeking the path through loop.
      
      I've tested with Mac and Linux (Cent OS), looks fine.
      
      This PR did not fix the scripts under `sbin` folder, not sure if it needs to be fixed also?
      
      Please help to review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      Author: Shay Rojansky <roji@roji.org>
      
      Closes #8669 from jerryshao/SPARK-2960.
      8aff36e9
  3. Jul 22, 2015
  4. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
  5. Mar 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
  6. Dec 10, 2014
    • GuoQiang Li's avatar
      [SPARK-4161]Spark shell class path is not correctly set if... · 742e7093
      GuoQiang Li authored
      [SPARK-4161]Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #3050 from witgo/SPARK-4161 and squashes the following commits:
      
      abb6fa4 [GuoQiang Li] move usejavacp opt to spark-shell
      89e39e7 [GuoQiang Li] review commit
      c2a6f04 [GuoQiang Li] Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf
      742e7093
  7. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  8. Aug 24, 2014
  9. Aug 09, 2014
    • Kousuke Saruta's avatar
      [SPARK-2894] spark-shell doesn't accept flags · 4f4a9884
      Kousuke Saruta authored
      As sryza reported, spark-shell doesn't accept any flags.
      The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1715, Closes #1864, and Closes #1861
      
      Closes #1825 from sarutak/SPARK-2894 and squashes the following commits:
      
      47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
      2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
      98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
      513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
      28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
      5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
      e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
      4f4a9884
  10. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  11. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  12. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  13. May 18, 2014
    • Neville Li's avatar
      Fix spark-submit path in spark-shell & pyspark · ebcd2d68
      Neville Li authored
      Author: Neville Li <neville@spotify.com>
      
      Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits:
      
      0dc33ed [Neville Li] Fix spark-submit path in pyspark
      becec64 [Neville Li] Fix spark-submit path in spark-shell
      ebcd2d68
  14. May 17, 2014
    • Andrew Or's avatar
      [SPARK-1808] Route bin/pyspark through Spark submit · 4b8ec6fc
      Andrew Or authored
      **Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`.
      
      **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.
      
      **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.
      
      For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.
      
      This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #799 from andrewor14/pyspark-submit and squashes the following commits:
      
      bf37e36 [Andrew Or] Minor changes
      01066fa [Andrew Or] bin/pyspark for Windows
      c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
      1866f85 [Andrew Or] Windows is not cooperating
      456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
      7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      b7ba0d8 [Andrew Or] Address a few comments (minor)
      06eb138 [Andrew Or] Use shlex instead of writing our own parser
      05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
      6fba412 [Andrew Or] Deal with quotes + address various comments
      fe4c8a7 [Andrew Or] Update --help for bin/pyspark
      afe47bf [Andrew Or] Fix spark shell
      f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      a371d26 [Andrew Or] Route bin/pyspark through Spark submit
      4b8ec6fc
  15. Apr 28, 2014
    • Patrick Wendell's avatar
      SPARK-1654 and SPARK-1653: Fixes in spark-submit. · 949e3931
      Patrick Wendell authored
      Deals with two issues:
      1. Spark shell didn't correctly pass quoted arguments to spark-submit.
      ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
      2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
         which triggered warnings. Now we use new, more narrowly scoped,
         variables.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #576 from pwendell/spark-submit and squashes the following commits:
      
      67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
      949e3931
  16. Apr 25, 2014
    • Patrick Wendell's avatar
      SPARK-1619 Launch spark-shell with spark-submit · dc3b640a
      Patrick Wendell authored
      This simplifies the shell a bunch and passes all arguments through to spark-submit.
      
      There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #542 from pwendell/spark-shell and squashes the following commits:
      
      9eb3e6f [Patrick Wendell] Updating Spark docs
      b552459 [Patrick Wendell] Andrew's feedback
      97720fa [Patrick Wendell] Review feedback
      aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
      dc3b640a
  17. Apr 07, 2014
    • Aaron Davidson's avatar
      SPARK-1099: Introduce local[*] mode to infer number of cores · 0307db0f
      Aaron Davidson authored
      This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #182 from aarondav/110 and squashes the following commits:
      
      a88294c [Aaron Davidson] Rebased changes for new spark-shell
      a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
      0307db0f
  18. Apr 04, 2014
    • Aaron Davidson's avatar
      SPARK-1404: Always upgrade spark-env.sh vars to environment vars · 01cf4c40
      Aaron Davidson authored
      This was broken when spark-env.sh was made idempotent, as the idempotence check is an environment variable, but the spark-env.sh variables may not have been.
      
      Tested in zsh, bash, and sh.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #310 from aarondav/SPARK-1404 and squashes the following commits:
      
      c3406a5 [Aaron Davidson] Add extra export in spark-shell
      6a0e340 [Aaron Davidson] SPARK-1404: Always upgrade spark-env.sh vars to environment vars
      01cf4c40
  19. Mar 29, 2014
    • Bernardo Gomez Palacio's avatar
      [SPARK-1186] : Enrich the Spark Shell to support additional arguments. · fda86d8b
      Bernardo Gomez Palacio authored
      Enrich the Spark Shell functionality to support the following options.
      
      ```
      Usage: spark-shell [OPTIONS]
      
      OPTIONS:
          -h  --help              : Print this help information.
          -c  --cores             : The maximum number of cores to be used by the Spark Shell.
          -em --executor-memory   : The memory used by each executor of the Spark Shell, the number
                                    is followed by m for megabytes or g for gigabytes, e.g. "1g".
          -dm --driver-memory     : The memory used by the Spark Shell, the number is followed
                                    by m for megabytes or g for gigabytes, e.g. "1g".
          -m  --master            : A full string that describes the Spark Master, defaults to "local"
                                    e.g. "spark://localhost:7077".
          --log-conf              : Enables logging of the supplied SparkConf as INFO at start of the
                                    Spark Context.
      
      e.g.
          spark-shell -m spark://localhost:7077 -c 4 -dm 512m -em 2g
      ```
      
      **Note**: this commit reflects the changes applied to _master_ based on [5d98cfc1].
      
      [ticket: SPARK-1186] : Enrich the Spark Shell to support additional arguments.
                              https://spark-project.atlassian.net/browse/SPARK-1186
      
      Author      : bernardo.gomezpalcio@gmail.com
      
      Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>
      
      Closes #116 from berngp/feature/enrich-spark-shell and squashes the following commits:
      
      c5f455f [Bernardo Gomez Palacio] [SPARK-1186] : Enrich the Spark Shell to support additional arguments.
      fda86d8b
  20. Mar 25, 2014
    • Aaron Davidson's avatar
      SPARK-1286: Make usage of spark-env.sh idempotent · 007a7334
      Aaron Davidson authored
      Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.
      
      One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.
      
      Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #184 from aarondav/idem and squashes the following commits:
      
      e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
      8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
      93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent
      007a7334
  21. Mar 09, 2014
    • Aaron Davidson's avatar
      SPARK-929: Fully deprecate usage of SPARK_MEM · 52834d76
      Aaron Davidson authored
      (Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)
      
      This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
      SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY
      
      The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.
      
      SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.
      
      SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.
      
      Other memory considerations:
      - The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
      - run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).
      
      This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #99 from aarondav/sparkmem and squashes the following commits:
      
      9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM
      52834d76
  22. Feb 17, 2014
    • CodingCat's avatar
      [SPARK-1090] improvement on spark_shell (help information, configure memory) · e0d49ad2
      CodingCat authored
      https://spark-project.atlassian.net/browse/SPARK-1090
      
      spark-shell should print help information about parameters and should allow user to configure exe memory
      there is no document about hot to set --cores/-c in spark-shell
      
      and also
      
      users should be able to set executor memory through command line options
      
      In this PR I also check the format of the options passed by the user
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #599 from CodingCat/spark_shell_improve and squashes the following commits:
      
      de5aa38 [CodingCat] add parameter to set driver memory
      915cbf8 [CodingCat] improvement on spark_shell (help information, configure memory)
      e0d49ad2
  23. Jan 15, 2014
    • CrazyJvm's avatar
      fix some format problem. · 84005364
      CrazyJvm authored
      84005364
    • CrazyJvm's avatar
      fix "set MASTER automatically fails" bug. · 7a0c5b5a
      CrazyJvm authored
      spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. 
      The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
      7a0c5b5a
  24. Jan 03, 2014
  25. Sep 23, 2013
  26. Sep 22, 2013
  27. Sep 01, 2013
  28. Aug 29, 2013
    • Matei Zaharia's avatar
      Change build and run instructions to use assemblies · 53cd50c0
      Matei Zaharia authored
      This commit makes Spark invocation saner by using an assembly JAR to
      find all of Spark's dependencies instead of adding all the JARs in
      lib_managed. It also packages the examples into an assembly and uses
      that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
      with two better-named scripts: "run-examples" for examples, and
      "spark-class" for Spark internal classes (e.g. REPL, master, etc). This
      is also designed to minimize the confusion people have in trying to use
      "run" to run their own classes; it's not meant to do that, but now at
      least if they look at it, they can modify run-examples to do a decent
      job for them.
      
      As part of this, Bagel's examples are also now properly moved to the
      examples package instead of bagel.
      53cd50c0
  29. Jul 21, 2013
  30. Jul 17, 2013
  31. Jul 16, 2013
  32. Jun 28, 2013
  33. Jun 25, 2013
  34. Jul 08, 2012
  35. May 31, 2011
  36. Mar 29, 2010
Loading