Skip to content
Snippets Groups Projects
  1. May 05, 2017
    • jyu00's avatar
      [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode · 5773ab12
      jyu00 authored
      ## What changes were proposed in this pull request?
      
      Updated spark-class to turn off posix mode so the process substitution doesn't cause a syntax error.
      
      ## How was this patch tested?
      
      Existing unit tests, manual spark-shell testing with posix mode on
      
      Author: jyu00 <jessieyu@us.ibm.com>
      
      Closes #17852 from jyu00/master.
      5773ab12
  2. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  3. Aug 08, 2016
    • Marcelo Vanzin's avatar
      [SPARK-16586][CORE] Handle JVM errors printed to stdout. · 1739e75f
      Marcelo Vanzin authored
      Some very rare JVM errors are printed to stdout, and that confuses
      the code in spark-class. So add a check so that those cases are
      detected and the proper error message is shown to the user.
      
      Tested by running spark-submit after setting "ulimit -v 32000".
      
      Closes #14231
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14508 from vanzin/SPARK-16586.
      1739e75f
  4. May 27, 2016
  5. May 10, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13670][LAUNCHER] Propagate error from launcher to shell. · 36c5892b
      Marcelo Vanzin authored
      bash doesn't really propagate errors from subshells when using redirection
      the way spark-class does; so, instead, this change captures the exit code
      of the launcher process in the command array, and checks it before executing
      the actual command.
      
      Tested by injecting an error in Main.java (the launcher entry point) and
      verifying the shell gets the right exit code from spark-class.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12910 from vanzin/SPARK-13670.
      36c5892b
  6. Apr 06, 2016
    • Holden Karau's avatar
      [SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no… · 457e58be
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package"
      
      ## How was this patch tested?
      
      Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.
      457e58be
  7. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  8. Mar 14, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13578][CORE] Modify launch scripts to not use assemblies. · 45f8053b
      Marcelo Vanzin authored
      Instead of looking for a specially-named assembly, the scripts now will
      blindly add all jars under the libs directory to the classpath. This
      libs directory is still currently the old assembly dir, so things should
      keep working the same way as before until we make more packaging changes.
      
      The only lost feature is the detection of multiple assemblies; I consider
      that a minor nicety that only really affects few developers, so it's probably
      ok.
      
      Tested locally by running spark-shell; also did some minor Win32 testing
      (just made sure spark-shell started).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11591 from vanzin/SPARK-13578.
      45f8053b
  9. Dec 08, 2015
  10. Nov 04, 2015
    • jerryshao's avatar
      [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) · 8aff36e9
      jerryshao authored
      This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look at this PR, thanks a lot.
      
      For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in `PATH`, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.
      
      Instead of using `readlink -f` like what this PR (https://github.com/apache/spark/pull/2386) implemented is that `-f` is not support for Mac, so here manually seeking the path through loop.
      
      I've tested with Mac and Linux (Cent OS), looks fine.
      
      This PR did not fix the scripts under `sbin` folder, not sure if it needs to be fixed also?
      
      Please help to review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      Author: Shay Rojansky <roji@roji.org>
      
      Closes #8669 from jerryshao/SPARK-2960.
      8aff36e9
  11. Oct 24, 2015
    • Jeffrey Naisbitt's avatar
      [SPARK-11264] bin/spark-class can't find assembly jars with certain GREP_OPTIONS set · 28132ceb
      Jeffrey Naisbitt authored
      Temporarily remove GREP_OPTIONS if set in bin/spark-class.
      
      Some GREP_OPTIONS will modify the output of the grep commands that are looking for the assembly jars.
      For example, if the -n option is specified, the grep output will look like:
      5:spark-assembly-1.5.1-hadoop2.4.0.jar
      
      This will not match the regular expressions, and so the jar files will not be found.  We could improve the regular expression to handle this case and trim off extra characters, but it is difficult to know which options may or may not be set.  Unsetting GREP_OPTIONS within the script handles all the cases and gives the desired output.
      
      Author: Jeffrey Naisbitt <jnaisbitt@familysearch.org>
      
      Closes #9231 from naisbitt/unset-GREP_OPTIONS.
      28132ceb
  12. Aug 28, 2015
    • Marcelo Vanzin's avatar
      [SPARK-9284] [TESTS] Allow all tests to run without an assembly. · c53c902f
      Marcelo Vanzin authored
      This change aims at speeding up the dev cycle a little bit, by making
      sure that all tests behave the same w.r.t. where the code to be tested
      is loaded from. Namely, that means that tests don't rely on the assembly
      anymore, rather loading all needed classes from the build directories.
      
      The main change is to make sure all build directories (classes and test-classes)
      are added to the classpath of child processes when running tests.
      
      YarnClusterSuite required some custom code since the executors are run
      differently (i.e. not through the launcher library, like standalone and
      Mesos do).
      
      I also found a couple of tests that could leak a SparkContext on failure,
      and added code to handle those.
      
      With this patch, it's possible to run the following command from a clean
      source directory and have all tests pass:
      
        mvn -Pyarn -Phadoop-2.4 -Phive-thriftserver install
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7629 from vanzin/SPARK-9284.
      c53c902f
  13. Jun 07, 2015
    • Sean Owen's avatar
      [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+ · e84815dc
      Sean Owen authored
      Update build to use Java 7, and remove some comments and special-case support for Java 6.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6265 from srowen/SPARK-7733 and squashes the following commits:
      
      59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
      e84815dc
  14. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
  15. Apr 14, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES. · 97173893
      Marcelo Vanzin authored
      The fix for SPARK-6406 broke the case where sub-processes are launched
      when SPARK_PREPEND_CLASSES is set, because the code now would only add
      the launcher's build directory to the sub-process's classpath instead
      of the complete assembly.
      
      This patch fixes the problem by having the launch scripts stash the
      assembly's location in an environment variable. This is not the prettiest
      solution, but it avoids having to plumb that location all the way through
      the Worker code that launches executors. The env variable is always
      set by the launch scripts, so users cannot override it.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5504 from vanzin/SPARK-6890 and squashes the following commits:
      
      7aec921 [Marcelo Vanzin] Fix tests.
      ff87a60 [Marcelo Vanzin] Merge branch 'master' into SPARK-6890
      31d3ce8 [Marcelo Vanzin] [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.
      97173893
  16. Mar 29, 2015
    • Nishkam Ravi's avatar
      [SPARK-6406] Launch Spark using assembly jar instead of a separate launcher jar · e3eb3939
      Nishkam Ravi authored
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      
      Closes #5085 from nishkamravi2/master_nravi and squashes the following commits:
      
      bad4349 [nishkamravi2] Update Main.java
      36a6f87 [Nishkam Ravi] Minor changes and bug fixes
      b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
      d9658d6 [Nishkam Ravi] Changes for SPARK-6406
      ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
      345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ac58975 [Nishkam Ravi] spark-class changes
      06bfeb0 [nishkamravi2] Update spark-class
      35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
      4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
      746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
      bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      d453197 [nishkamravi2] Update NewHadoopRDD.scala
      6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
      0ce2c32 [nishkamravi2] Update HadoopRDD.scala
      f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
      71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      494d8c0 [nishkamravi2] Update DiskBlockManager.scala
      3c5ddba [nishkamravi2] Update DiskBlockManager.scala
      f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
      79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
      535295a [nishkamravi2] Update TaskSetManager.scala
      3e1b616 [Nishkam Ravi] Modify test for maxResultSize
      9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
      5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      e3eb3939
  17. Mar 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
  18. Jan 25, 2015
  19. Jan 19, 2015
    • Jongyoul Lee's avatar
      [SPARK-5088] Use spark-class for running executors directly · 4a4f9ccb
      Jongyoul Lee authored
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits:
      
      8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases
      932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master
      613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases
      ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import
      97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement
      4a4f9ccb
  20. Jan 16, 2015
  21. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  22. Oct 30, 2014
    • GuoQiang Li's avatar
      [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH instead of -Djava.library.path · cd739bd7
      GuoQiang Li authored
      - [X] Standalone
      - [X] YARN
      - [X] Mesos
      - [X]  Mac OS X
      - [X] Linux
      - [ ]  Windows
      
      This is another implementation about #1031
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2711 from witgo/SPARK-1719 and squashes the following commits:
      
      c7b26f6 [GuoQiang Li] review commits
      4488e41 [GuoQiang Li] Refactoring CommandUtils
      a444094 [GuoQiang Li] review commits
      40c0b4a [GuoQiang Li] Add buildLocalCommand method
      c1a0ddd [GuoQiang Li] fix comments
      156ce88 [GuoQiang Li] review commit
      38aa377 [GuoQiang Li] Refactor CommandUtils.scala
      4269e00 [GuoQiang Li] Refactor SparkSubmitDriverBootstrapper.scala
      7a1d634 [GuoQiang Li] use LD_LIBRARY_PATH instead of -Djava.library.path
      cd739bd7
  23. Oct 14, 2014
    • cocoatomo's avatar
      [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set · 7b4f39f6
      cocoatomo authored
      When _JAVA_OPTIONS environment variable is set, a command "java -version" outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
      ./bin/spark-class knows java version from the first line of "java -version" output, so it mistakes java version with _JAVA_OPTIONS set.
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2725 from cocoatomo/issues/3869-mistake-java-version and squashes the following commits:
      
      f894ebd [cocoatomo] [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set
      7b4f39f6
  24. Oct 03, 2014
    • Masayoshi TSUZUKI's avatar
      [SPARK-3775] Not suitable error message in spark-shell.cmd · 358d7ffd
      Masayoshi TSUZUKI authored
      Modified some sentence of error message in bin\*.cmd.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2640 from tsudukim/feature/SPARK-3775 and squashes the following commits:
      
      3458afb [Masayoshi TSUZUKI] [SPARK-3775] Not suitable error message in spark-shell.cmd
      358d7ffd
  25. Sep 15, 2014
  26. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  27. Aug 23, 2014
    • Daoyuan Wang's avatar
      [SPARK-3068]remove MaxPermSize option for jvm 1.8 · f3d65cd0
      Daoyuan Wang authored
      In JVM 1.8.0, MaxPermSize is no longer supported.
      In spark `stderr` output, there would be a line of
      
          Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #2011 from adrian-wang/maxpermsize and squashes the following commits:
      
      ef1d660 [Daoyuan Wang] direct get java version in runtime
      37db9c1 [Daoyuan Wang] code refine
      3c1d554 [Daoyuan Wang] remove MaxPermSize option for jvm 1.8
      f3d65cd0
  28. Aug 20, 2014
    • Andrew Or's avatar
      [SPARK-2849] Handle driver configs separately in client mode · b3ec51bf
      Andrew Or authored
      In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
      ```
      spark.driver.memory
      spark.driver.extraJavaOptions
      spark.driver.extraClassPath
      spark.driver.extraLibraryPath
      ```
      This PR handles these properties before launching the driver JVM. It achieves this by spawning a separate JVM that runs a new class called `SparkSubmitDriverBootstrapper`, which spawns `SparkSubmit` as a sub-process with the appropriate classpath, library paths, java opts and memory.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1845 from andrewor14/handle-configs-bash and squashes the following commits:
      
      bed4bdf [Andrew Or] Change a few comments / messages (minor)
      24dba60 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      08fd788 [Andrew Or] Warn against external usages of SparkSubmitDriverBootstrapper
      ff34728 [Andrew Or] Minor comments
      51aeb01 [Andrew Or] Filter out JVM memory in Scala rather than Bash (minor)
      9a778f6 [Andrew Or] Fix PySpark: actually kill driver on termination
      d0f20db [Andrew Or] Don't pass empty library paths, classpath, java opts etc.
      a78cb26 [Andrew Or] Revert a few changes in utils.sh (minor)
      9ba37e2 [Andrew Or] Don't barf when the properties file does not exist
      8867a09 [Andrew Or] A few more naming things (minor)
      19464ad [Andrew Or] SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
      d6488f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      1ea6bbe [Andrew Or] SparkClassLauncher -> SparkSubmitDriverBootstrapper
      a91ea19 [Andrew Or] Fix precedence of library paths, classpath, java opts and memory
      158f813 [Andrew Or] Remove "client mode" boolean argument
      c84f5c8 [Andrew Or] Remove debug print statement (minor)
      b71f52b [Andrew Or] Revert a few more changes (minor)
      7d94a8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      3a8235d [Andrew Or] Only parse the properties file if special configs exist
      c37e08d [Andrew Or] Revert a few more changes
      a396eda [Andrew Or] Nullify my own hard work to simplify bash
      0effa1e [Andrew Or] Add code in Scala that handles special configs
      c886568 [Andrew Or] Fix lines too long + a few comments / style (minor)
      7a4190a [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      7396be2 [Andrew Or] Explicitly comment that multi-line properties are not supported
      fa11ef8 [Andrew Or] Parse the properties file only if the special configs exist
      371cac4 [Andrew Or] Add function prefix (minor)
      be99eb3 [Andrew Or] Fix tests to not include multi-line configs
      bd0d468 [Andrew Or] Simplify parsing config file by ignoring multi-line arguments
      56ac247 [Andrew Or] Use eval and set to simplify splitting
      8d4614c [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      aeb79c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      2732ac0 [Andrew Or] Integrate BASH tests into dev/run-tests + log error properly
      8d26a5c [Andrew Or] Add tests for bash/utils.sh
      4ae24c3 [Andrew Or] Fix bug: escape properly in quote_java_property
      b3c4cd5 [Andrew Or] Fix bug: count the number of quotes instead of detecting presence
      c2273fc [Andrew Or] Fix typo (minor)
      e793e5f [Andrew Or] Handle multi-line arguments
      5d8f8c4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      c7b9926 [Andrew Or] Minor changes to spark-defaults.conf.template
      a992ae2 [Andrew Or] Escape spark.*.extraJavaOptions correctly
      aabfc7e [Andrew Or] escape -> split (minor)
      45a1eb9 [Andrew Or] Fix bug: escape escaped backslashes and quotes properly...
      1cdc6b1 [Andrew Or] Fix bug: escape escaped double quotes properly
      c854859 [Andrew Or] Add small comment
      c13a2cb [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      8e552b7 [Andrew Or] Include an example of spark.*.extraJavaOptions
      de765c9 [Andrew Or] Print spark-class command properly
      a4df3c4 [Andrew Or] Move parsing and escaping logic to utils.sh
      dec2343 [Andrew Or] Only export variables if they exist
      fa2136e [Andrew Or] Escape Java options + parse java properties files properly
      ef12f74 [Andrew Or] Minor formatting
      4ec22a1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      e5cfb46 [Andrew Or] Collapse duplicate code + fix potential whitespace issues
      4edcaa8 [Andrew Or] Redirect stdout to stderr for python
      130f295 [Andrew Or] Handle spark.driver.memory too
      98dd8e3 [Andrew Or] Add warning if properties file does not exist
      8843562 [Andrew Or] Fix compilation issues...
      75ee6b4 [Andrew Or] Remove accidentally added file
      63ed2e9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      0025474 [Andrew Or] Revert SparkSubmit handling of --driver-* options for only cluster mode
      a2ab1b0 [Andrew Or] Parse spark.driver.extra* in bash
      250cb95 [Andrew Or] Do not ignore spark.driver.extra* for client mode
      b3ec51bf
  29. Jul 10, 2014
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  30. Jul 03, 2014
    • Prashant Sharma's avatar
      [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b
      Prashant Sharma authored
      Trivial fix.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:
      
      77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
      13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
      731f683b
  31. Jun 23, 2014
  32. Jun 12, 2014
    • Patrick Wendell's avatar
      SPARK-1843: Replace assemble-deps with env variable. · 1c04652c
      Patrick Wendell authored
      (This change is actually small, I moved some logic into
      compute-classpath that was previously in spark-class).
      
      Assemble deps has existed for a while to allow developers to
      run local code with new changes quickly. When I'm developing I
      typically use a simpler approach which just prepends the Spark
      classes to the classpath before the assembly jar. This is well
      defined in the JVM and the Spark classes take precedence over those
      in the assembly.
      
      This approach is portable across both builds which is the main reason I'd
      like to switch to it. It's also a bit easier to toggle on and off quickly.
      
      The way you use this is the following:
      ```
      $ ./bin/spark-shell # Use spark with the normal assembly
      $ export SPARK_PREPEND_CLASSES=true
      $ ./bin/spark-shell # Now it's using compiled classes
      $ unset SPARK_PREPEND_CLASSES
      $ ./bin/spark-shell # Back to normal
      ```
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #877 from pwendell/assemble-deps and squashes the following commits:
      
      8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps
      faa3168 [Patrick Wendell] Adding a warning for compatibility
      3f151a7 [Patrick Wendell] Small fix
      bbfb73c [Patrick Wendell] Review feedback
      328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.
      1c04652c
  33. May 21, 2014
    • Sumedh Mungee's avatar
      [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class · 6e337380
      Sumedh Mungee authored
      Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark.
      
      Author: Sumedh Mungee <smungee@gmail.com>
      
      Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:
      
      26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
      6e337380
  34. May 19, 2014
    • Matei Zaharia's avatar
      SPARK-1879. Increase MaxPermSize since some of our builds have many classes · 5af99d76
      Matei Zaharia authored
      See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.
      
      Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #823 from mateiz/spark-1879 and squashes the following commits:
      
      6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
      5af99d76
  35. May 09, 2014
    • Patrick Wendell's avatar
      SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. · 06b15baa
      Patrick Wendell authored
      Gives a nicely formatted message to the user when `run-example` is run to
      tell them to use `spark-submit`.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #704 from pwendell/examples and squashes the following commits:
      
      1996ee8 [Patrick Wendell] Feedback form Andrew
      3eb7803 [Patrick Wendell] Suggestions from TD
      2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
      06b15baa
  36. May 04, 2014
    • Patrick Wendell's avatar
      SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7. · 0c98a8f6
      Patrick Wendell authored
      This add some guards and good warning messages if users hit this issue. /cc @aarondav with whom I discussed parts of the design.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #627 from pwendell/jdk6 and squashes the following commits:
      
      a38a958 [Patrick Wendell] Code review feedback
      94e9f84 [Patrick Wendell] SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7.
      0c98a8f6
  37. Apr 28, 2014
    • Patrick Wendell's avatar
      SPARK-1654 and SPARK-1653: Fixes in spark-submit. · 949e3931
      Patrick Wendell authored
      Deals with two issues:
      1. Spark shell didn't correctly pass quoted arguments to spark-submit.
      ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
      2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
         which triggered warnings. Now we use new, more narrowly scoped,
         variables.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #576 from pwendell/spark-submit and squashes the following commits:
      
      67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
      949e3931
  38. Apr 21, 2014
    • Patrick Wendell's avatar
      Clean up and simplify Spark configuration · fb98488f
      Patrick Wendell authored
      Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
      
      1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
      2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
      3. Adds ability to set these same variables for the driver using `spark-submit`.
      4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
      5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #299 from pwendell/config-cleanup and squashes the following commits:
      
      127f301 [Patrick Wendell] Improvements to testing
      a006464 [Patrick Wendell] Moving properties file template.
      b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
      0086939 [Patrick Wendell] Minor style fixes
      af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
      b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
      af0adf7 [Patrick Wendell] Automatically add user jar
      a56b125 [Patrick Wendell] Responses to Tom's review
      d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a762901 [Patrick Wendell] Fixing test failures
      ffa00fe [Patrick Wendell] Review feedback
      fda0301 [Patrick Wendell] Note
      308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
      e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
      be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
      c2a2909 [Patrick Wendell] Test compile fixes
      4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
      afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
      b08893b [Patrick Wendell] Additional improvements.
      ace4ead [Patrick Wendell] Responses to review feedback.
      b72d183 [Patrick Wendell] Review feedback for spark env file
      46555c1 [Patrick Wendell] Review feedback and import clean-ups
      437aed1 [Patrick Wendell] Small fix
      761ebcd [Patrick Wendell] Library path and classpath for drivers
      7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
      5b0ba8e [Patrick Wendell] Don't ship executor envs
      84cc5e5 [Patrick Wendell] Small clean-up
      1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
      4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
      6eaf7d0 [Patrick Wendell] executorJavaOpts
      0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
      ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
      fb98488f
  39. Apr 10, 2014
    • Andrew Or's avatar
      [SPARK-1276] Add a HistoryServer to render persisted UI · 79820fe8
      Andrew Or authored
      The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
      Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.
      
      This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.
      
      To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.
      
      Comments and feedback are most welcome.
      
      ---
      
      A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.
      
      A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #204 from andrewor14/master and squashes the following commits:
      
      7b7234c [Andrew Or] Finished -> Completed
      b158d98 [Andrew Or] Address Patrick's comments
      69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
      19d5dd0 [Andrew Or] Merge github.com:apache/spark
      f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
      2dfb494 [Andrew Or] Decouple checking for application completion from replaying
      d02dbaa [Andrew Or] Expose Spark version and include it in event logs
      2282300 [Andrew Or] Add documentation for the HistoryServer
      567474a [Andrew Or] Merge github.com:apache/spark
      6edf052 [Andrew Or] Merge github.com:apache/spark
      19e1fb4 [Andrew Or] Address Thomas' comments
      248cb3d [Andrew Or] Limit number of live applications + add configurability
      a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
      bc46fc8 [Andrew Or] Merge github.com:apache/spark
      e2f4ff9 [Andrew Or] Merge github.com:apache/spark
      050419e [Andrew Or] Merge github.com:apache/spark
      81b568b [Andrew Or] Fix strange error messages...
      0670743 [Andrew Or] Decouple page rendering from loading files from disk
      1b2f391 [Andrew Or] Minor changes
      a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
      d5154da [Andrew Or] Styling and comments
      5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
      60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
      7584418 [Andrew Or] Report application start/end times to HistoryServer
      8aac163 [Andrew Or] Add basic application table
      c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface
      79820fe8
  40. Apr 06, 2014
    • Aaron Davidson's avatar
      SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging · 41065584
      Aaron Davidson authored
      Previously, we based our decision regarding including datanucleus jars based on the existence of a spark-hive-assembly jar, which was incidentally built whenever "sbt assembly" is run. This means that a typical and previously supported pathway would start using hive jars.
      
      This patch has the following features/bug fixes:
      
      - Use of SPARK_HIVE (default false) to determine if we should include Hive in the assembly jar.
      - Analagous feature in Maven with -Phive (previously, there was no support for adding Hive to any of our jars produced by Maven)
      - assemble-deps fixed since we no longer use a different ASSEMBLY_DIR
      - avoid adding log message in compute-classpath.sh to the classpath :)
      
      Still TODO before mergeable:
      - We need to download the datanucleus jars outside of sbt. Perhaps we can have spark-class download them if SPARK_HIVE is set similar to how sbt downloads itself.
      - Spark SQL documentation updates.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #237 from aarondav/master and squashes the following commits:
      
      5dc4329 [Aaron Davidson] Typo fixes
      dd4f298 [Aaron Davidson] Doc update
      dd1a365 [Aaron Davidson] Eliminate need for SPARK_HIVE at runtime by d/ling datanucleus from Maven
      a9269b5 [Aaron Davidson] [WIP] Use SPARK_HIVE to determine if we include Hive in packaging
      41065584
Loading