Skip to content
Snippets Groups Projects
  1. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  2. Mar 15, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13576][BUILD] Don't create assembly for examples. · 48978abf
      Marcelo Vanzin authored
      As part of the goal to stop creating assemblies in Spark, this change
      modifies the mvn and sbt builds to not create an assembly for examples.
      
      Instead, dependencies are copied to the build directory (under
      target/scala-xx/jars), and in the final archive, into the "examples/jars"
      directory.
      
      To avoid having to deal too much with Windows batch files, I made examples
      run through the launcher library; the spark-submit launcher now has a
      special mode to run examples, which adds all the necessary jars to the
      spark-submit command line, and replaces the bash and batch scripts that
      were used to run examples. The scripts are now just a thin wrapper around
      spark-submit; another advantage is that now all spark-submit options are
      supported.
      
      There are a few glitches; in the mvn build, a lot of duplicated dependencies
      get copied, because they are promoted to "compile" scope due to extra
      dependencies in the examples module (such as HBase). In the sbt build,
      all dependencies are copied, because there doesn't seem to be an easy
      way to filter things.
      
      I plan to clean some of this up when the rest of the tasks are finished.
      When the main assembly is replaced with jars, we can remove duplicate jars
      from the examples directory during packaging.
      
      Tested by running SparkPi in: maven build, sbt build, dist created by
      make-distribution.sh.
      
      Finally: note that running the "assembly" target in sbt doesn't build
      the examples anymore. You need to run "package" for that.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11452 from vanzin/SPARK-13576.
      48978abf
  3. Nov 04, 2015
    • jerryshao's avatar
      [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) · 8aff36e9
      jerryshao authored
      This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look at this PR, thanks a lot.
      
      For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in `PATH`, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.
      
      Instead of using `readlink -f` like what this PR (https://github.com/apache/spark/pull/2386) implemented is that `-f` is not support for Mac, so here manually seeking the path through loop.
      
      I've tested with Mac and Linux (Cent OS), looks fine.
      
      This PR did not fix the scripts under `sbin` folder, not sure if it needs to be fixed also?
      
      Please help to review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      Author: Shay Rojansky <roji@roji.org>
      
      Closes #8669 from jerryshao/SPARK-2960.
      8aff36e9
  4. Mar 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
  5. Feb 12, 2015
  6. Jan 19, 2015
    • Venkata Ramana Gollamudi's avatar
      [SPARK-4504][Examples] fix run-example failure if multiple assembly jars exist · 74de94ea
      Venkata Ramana Gollamudi authored
      Fix run-example script to fail fast with useful error message if multiple
      example assembly JARs are present.
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #3377 from gvramana/run-example_fails and squashes the following commits:
      
      fa7f481 [Venkata Ramana Gollamudi] Fixed review comments, avoiding ls output scanning.
      6aa1ab7 [Venkata Ramana Gollamudi] Fix run-examples script error during multiple jars
      74de94ea
  7. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  8. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  9. Aug 02, 2014
    • Chris Fregly's avatar
      [SPARK-1981] Add AWS Kinesis streaming support · 91f9504e
      Chris Fregly authored
      Author: Chris Fregly <chris@fregly.com>
      
      Closes #1434 from cfregly/master and squashes the following commits:
      
      4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
      0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
      691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
      0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
      e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
      d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
      912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
      db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
      338997e [Chris Fregly] improve build docs for kinesis
      828f8ae [Chris Fregly] more cleanup
      e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      cd68c0d [Chris Fregly] fixed typos and backward compatibility
      d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
      91f9504e
  10. Jul 03, 2014
    • Prashant Sharma's avatar
      [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b
      Prashant Sharma authored
      Trivial fix.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:
      
      77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
      13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
      731f683b
  11. Jun 08, 2014
    • maji2014's avatar
      Update run-example · e9261d08
      maji2014 authored
      Old code can only be ran under spark_home and use "bin/run-example".
       Error "./run-example: line 55: ./bin/spark-submit: No such file or directory" appears when running in other place. So change this
      
      Author: maji2014 <maji3@asiainfo-linkage.com>
      
      Closes #1011 from maji2014/master and squashes the following commits:
      
      2cc1af6 [maji2014] Update run-example
      
      Closes #988.
      e9261d08
  12. May 19, 2014
    • Matei Zaharia's avatar
      [SPARK-1876] Windows fixes to deal with latest distribution layout changes · 7b70a707
      Matei Zaharia authored
      - Look for JARs in the right place
      - Launch examples the same way as on Unix
      - Load datanucleus JARs if they exist
      - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
      - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #819 from mateiz/win-fixes and squashes the following commits:
      
      d558f96 [Matei Zaharia] Fix comment
      228577b [Matei Zaharia] Review comments
      d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
      144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
      7b70a707
  13. May 09, 2014
    • Patrick Wendell's avatar
      SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. · 06b15baa
      Patrick Wendell authored
      Gives a nicely formatted message to the user when `run-example` is run to
      tell them to use `spark-submit`.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #704 from pwendell/examples and squashes the following commits:
      
      1996ee8 [Patrick Wendell] Feedback form Andrew
      3eb7803 [Patrick Wendell] Suggestions from TD
      2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
      06b15baa
  14. Apr 23, 2014
    • Patrick Wendell's avatar
      SPARK-1119 and other build improvements · cd4ed293
      Patrick Wendell authored
      1. Makes assembly and examples jar naming consistent in maven/sbt.
      2. Updates make-distribution.sh to use Maven and fixes some bugs.
      3. Updates the create-release script to call make-distribution script.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #502 from pwendell/make-distribution and squashes the following commits:
      
      1a97f0d [Patrick Wendell] SPARK-1119 and other build improvements
      cd4ed293
  15. Apr 21, 2014
    • Patrick Wendell's avatar
      Clean up and simplify Spark configuration · fb98488f
      Patrick Wendell authored
      Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
      
      1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
      2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
      3. Adds ability to set these same variables for the driver using `spark-submit`.
      4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
      5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #299 from pwendell/config-cleanup and squashes the following commits:
      
      127f301 [Patrick Wendell] Improvements to testing
      a006464 [Patrick Wendell] Moving properties file template.
      b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
      0086939 [Patrick Wendell] Minor style fixes
      af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
      b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
      af0adf7 [Patrick Wendell] Automatically add user jar
      a56b125 [Patrick Wendell] Responses to Tom's review
      d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a762901 [Patrick Wendell] Fixing test failures
      ffa00fe [Patrick Wendell] Review feedback
      fda0301 [Patrick Wendell] Note
      308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
      e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
      be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
      c2a2909 [Patrick Wendell] Test compile fixes
      4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
      afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
      b08893b [Patrick Wendell] Additional improvements.
      ace4ead [Patrick Wendell] Responses to review feedback.
      b72d183 [Patrick Wendell] Review feedback for spark env file
      46555c1 [Patrick Wendell] Review feedback and import clean-ups
      437aed1 [Patrick Wendell] Small fix
      761ebcd [Patrick Wendell] Library path and classpath for drivers
      7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
      5b0ba8e [Patrick Wendell] Don't ship executor envs
      84cc5e5 [Patrick Wendell] Small clean-up
      1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
      4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
      6eaf7d0 [Patrick Wendell] executorJavaOpts
      0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
      ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
      fb98488f
  16. Mar 25, 2014
    • Aaron Davidson's avatar
      SPARK-1286: Make usage of spark-env.sh idempotent · 007a7334
      Aaron Davidson authored
      Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.
      
      One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.
      
      Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #184 from aarondav/idem and squashes the following commits:
      
      e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
      8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
      93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent
      007a7334
  17. Jan 21, 2014
  18. Jan 20, 2014
  19. Jan 07, 2014
  20. Jan 06, 2014
  21. Jan 03, 2014
  22. Jan 02, 2014
  23. Sep 29, 2013
  24. Sep 23, 2013
  25. Sep 22, 2013
  26. Sep 15, 2013
  27. Sep 11, 2013
  28. Aug 30, 2013
  29. Aug 29, 2013
    • Matei Zaharia's avatar
      Update Maven build to create assemblies expected by new scripts · 666d93c2
      Matei Zaharia authored
      This includes the following changes:
      - The "assembly" package now builds in Maven by default, and creates an
        assembly containing both hadoop-client and Spark, unlike the old
        BigTop distribution assembly that skipped hadoop-client
      - There is now a bigtop-dist package to build the old BigTop assembly
      - The repl-bin package is no longer built by default since the scripts
        don't reply on it; instead it can be enabled with -Prepl-bin
      - Py4J is now included in the assembly/lib folder as a local Maven repo,
        so that the Maven package can link to it
      - run-example now adds the original Spark classpath as well because the
        Maven examples assembly lists spark-core and such as provided
      - The various Maven projects add a spark-yarn dependency correctly
      666d93c2
    • Matei Zaharia's avatar
      Change build and run instructions to use assemblies · 53cd50c0
      Matei Zaharia authored
      This commit makes Spark invocation saner by using an assembly JAR to
      find all of Spark's dependencies instead of adding all the JARs in
      lib_managed. It also packages the examples into an assembly and uses
      that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
      with two better-named scripts: "run-examples" for examples, and
      "spark-class" for Spark internal classes (e.g. REPL, master, etc). This
      is also designed to minimize the confusion people have in trying to use
      "run" to run their own classes; it's not meant to do that, but now at
      least if they look at it, they can modify run-examples to do a decent
      job for them.
      
      As part of this, Bagel's examples are also now properly moved to the
      examples package instead of bagel.
      53cd50c0
Loading