Skip to content
Snippets Groups Projects
  1. Jul 03, 2014
    • Prashant Sharma's avatar
      [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b
      Prashant Sharma authored
      Trivial fix.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:
      
      77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
      13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
      731f683b
  2. Jun 23, 2014
  3. Jun 12, 2014
    • Patrick Wendell's avatar
      SPARK-1843: Replace assemble-deps with env variable. · 1c04652c
      Patrick Wendell authored
      (This change is actually small, I moved some logic into
      compute-classpath that was previously in spark-class).
      
      Assemble deps has existed for a while to allow developers to
      run local code with new changes quickly. When I'm developing I
      typically use a simpler approach which just prepends the Spark
      classes to the classpath before the assembly jar. This is well
      defined in the JVM and the Spark classes take precedence over those
      in the assembly.
      
      This approach is portable across both builds which is the main reason I'd
      like to switch to it. It's also a bit easier to toggle on and off quickly.
      
      The way you use this is the following:
      ```
      $ ./bin/spark-shell # Use spark with the normal assembly
      $ export SPARK_PREPEND_CLASSES=true
      $ ./bin/spark-shell # Now it's using compiled classes
      $ unset SPARK_PREPEND_CLASSES
      $ ./bin/spark-shell # Back to normal
      ```
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #877 from pwendell/assemble-deps and squashes the following commits:
      
      8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps
      faa3168 [Patrick Wendell] Adding a warning for compatibility
      3f151a7 [Patrick Wendell] Small fix
      bbfb73c [Patrick Wendell] Review feedback
      328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.
      1c04652c
  4. May 21, 2014
    • Sumedh Mungee's avatar
      [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class · 6e337380
      Sumedh Mungee authored
      Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark.
      
      Author: Sumedh Mungee <smungee@gmail.com>
      
      Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:
      
      26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
      6e337380
  5. May 19, 2014
    • Matei Zaharia's avatar
      SPARK-1879. Increase MaxPermSize since some of our builds have many classes · 5af99d76
      Matei Zaharia authored
      See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.
      
      Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #823 from mateiz/spark-1879 and squashes the following commits:
      
      6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
      5af99d76
  6. May 09, 2014
    • Patrick Wendell's avatar
      SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. · 06b15baa
      Patrick Wendell authored
      Gives a nicely formatted message to the user when `run-example` is run to
      tell them to use `spark-submit`.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #704 from pwendell/examples and squashes the following commits:
      
      1996ee8 [Patrick Wendell] Feedback form Andrew
      3eb7803 [Patrick Wendell] Suggestions from TD
      2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
      06b15baa
  7. May 04, 2014
    • Patrick Wendell's avatar
      SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7. · 0c98a8f6
      Patrick Wendell authored
      This add some guards and good warning messages if users hit this issue. /cc @aarondav with whom I discussed parts of the design.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #627 from pwendell/jdk6 and squashes the following commits:
      
      a38a958 [Patrick Wendell] Code review feedback
      94e9f84 [Patrick Wendell] SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7.
      0c98a8f6
  8. Apr 28, 2014
    • Patrick Wendell's avatar
      SPARK-1654 and SPARK-1653: Fixes in spark-submit. · 949e3931
      Patrick Wendell authored
      Deals with two issues:
      1. Spark shell didn't correctly pass quoted arguments to spark-submit.
      ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
      2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
         which triggered warnings. Now we use new, more narrowly scoped,
         variables.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #576 from pwendell/spark-submit and squashes the following commits:
      
      67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
      949e3931
  9. Apr 21, 2014
    • Patrick Wendell's avatar
      Clean up and simplify Spark configuration · fb98488f
      Patrick Wendell authored
      Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
      
      1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
      2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
      3. Adds ability to set these same variables for the driver using `spark-submit`.
      4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
      5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #299 from pwendell/config-cleanup and squashes the following commits:
      
      127f301 [Patrick Wendell] Improvements to testing
      a006464 [Patrick Wendell] Moving properties file template.
      b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
      0086939 [Patrick Wendell] Minor style fixes
      af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
      b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
      af0adf7 [Patrick Wendell] Automatically add user jar
      a56b125 [Patrick Wendell] Responses to Tom's review
      d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a762901 [Patrick Wendell] Fixing test failures
      ffa00fe [Patrick Wendell] Review feedback
      fda0301 [Patrick Wendell] Note
      308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
      e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
      be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
      c2a2909 [Patrick Wendell] Test compile fixes
      4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
      afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
      b08893b [Patrick Wendell] Additional improvements.
      ace4ead [Patrick Wendell] Responses to review feedback.
      b72d183 [Patrick Wendell] Review feedback for spark env file
      46555c1 [Patrick Wendell] Review feedback and import clean-ups
      437aed1 [Patrick Wendell] Small fix
      761ebcd [Patrick Wendell] Library path and classpath for drivers
      7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
      5b0ba8e [Patrick Wendell] Don't ship executor envs
      84cc5e5 [Patrick Wendell] Small clean-up
      1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
      4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
      6eaf7d0 [Patrick Wendell] executorJavaOpts
      0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
      ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
      fb98488f
  10. Apr 10, 2014
    • Andrew Or's avatar
      [SPARK-1276] Add a HistoryServer to render persisted UI · 79820fe8
      Andrew Or authored
      The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
      Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.
      
      This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.
      
      To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.
      
      Comments and feedback are most welcome.
      
      ---
      
      A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.
      
      A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #204 from andrewor14/master and squashes the following commits:
      
      7b7234c [Andrew Or] Finished -> Completed
      b158d98 [Andrew Or] Address Patrick's comments
      69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
      19d5dd0 [Andrew Or] Merge github.com:apache/spark
      f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
      2dfb494 [Andrew Or] Decouple checking for application completion from replaying
      d02dbaa [Andrew Or] Expose Spark version and include it in event logs
      2282300 [Andrew Or] Add documentation for the HistoryServer
      567474a [Andrew Or] Merge github.com:apache/spark
      6edf052 [Andrew Or] Merge github.com:apache/spark
      19e1fb4 [Andrew Or] Address Thomas' comments
      248cb3d [Andrew Or] Limit number of live applications + add configurability
      a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
      bc46fc8 [Andrew Or] Merge github.com:apache/spark
      e2f4ff9 [Andrew Or] Merge github.com:apache/spark
      050419e [Andrew Or] Merge github.com:apache/spark
      81b568b [Andrew Or] Fix strange error messages...
      0670743 [Andrew Or] Decouple page rendering from loading files from disk
      1b2f391 [Andrew Or] Minor changes
      a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
      d5154da [Andrew Or] Styling and comments
      5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
      60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
      7584418 [Andrew Or] Report application start/end times to HistoryServer
      8aac163 [Andrew Or] Add basic application table
      c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface
      79820fe8
  11. Apr 06, 2014
    • Aaron Davidson's avatar
      SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging · 41065584
      Aaron Davidson authored
      Previously, we based our decision regarding including datanucleus jars based on the existence of a spark-hive-assembly jar, which was incidentally built whenever "sbt assembly" is run. This means that a typical and previously supported pathway would start using hive jars.
      
      This patch has the following features/bug fixes:
      
      - Use of SPARK_HIVE (default false) to determine if we should include Hive in the assembly jar.
      - Analagous feature in Maven with -Phive (previously, there was no support for adding Hive to any of our jars produced by Maven)
      - assemble-deps fixed since we no longer use a different ASSEMBLY_DIR
      - avoid adding log message in compute-classpath.sh to the classpath :)
      
      Still TODO before mergeable:
      - We need to download the datanucleus jars outside of sbt. Perhaps we can have spark-class download them if SPARK_HIVE is set similar to how sbt downloads itself.
      - Spark SQL documentation updates.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #237 from aarondav/master and squashes the following commits:
      
      5dc4329 [Aaron Davidson] Typo fixes
      dd4f298 [Aaron Davidson] Doc update
      dd1a365 [Aaron Davidson] Eliminate need for SPARK_HIVE at runtime by d/ling datanucleus from Maven
      a9269b5 [Aaron Davidson] [WIP] Use SPARK_HIVE to determine if we include Hive in packaging
      41065584
  12. Mar 25, 2014
    • Aaron Davidson's avatar
      SPARK-1286: Make usage of spark-env.sh idempotent · 007a7334
      Aaron Davidson authored
      Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.
      
      One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.
      
      Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #184 from aarondav/idem and squashes the following commits:
      
      e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
      8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
      93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent
      007a7334
  13. Mar 24, 2014
    • Patrick Wendell's avatar
      SPARK-1094 Support MiMa for reporting binary compatibility accross versions. · dc126f21
      Patrick Wendell authored
      This adds some changes on top of the initial work by @scrapcodes in #20:
      
      The goal here is to do automated checking of Spark commits to determine whether they break binary compatibility.
      
      1. Special case for inner classes of package-private objects.
      2. Made tools classes accessible when running `spark-class`.
      3. Made some declared types in MLLib more general.
      4. Various other improvements to exclude-generation script.
      5. In-code documentation.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Prashant Sharma <scrapcodes@gmail.com>
      
      Closes #207 from pwendell/mima and squashes the following commits:
      
      22ae267 [Patrick Wendell] New binary changes after upmerge
      6c2030d [Patrick Wendell] Merge remote-tracking branch 'apache/master' into mima
      3666cf1 [Patrick Wendell] Minor style change
      0e0f570 [Patrick Wendell] Small fix and removing directory listings
      647c547 [Patrick Wendell] Reveiw feedback.
      c39f3b5 [Patrick Wendell] Some enhancements to binary checking.
      4c771e0 [Prashant Sharma] Added a tool to generate mima excludes and also adapted build to pick automatically.
      b551519 [Prashant Sharma] adding a new exclude after rebasing with master
      651844c [Prashant Sharma] Support MiMa for reporting binary compatibility accross versions.
      dc126f21
  14. Mar 09, 2014
    • Aaron Davidson's avatar
      SPARK-929: Fully deprecate usage of SPARK_MEM · 52834d76
      Aaron Davidson authored
      (Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)
      
      This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
      SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY
      
      The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.
      
      SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.
      
      SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.
      
      Other memory considerations:
      - The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
      - run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).
      
      This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #99 from aarondav/sparkmem and squashes the following commits:
      
      9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM
      52834d76
  15. Jan 06, 2014
  16. Jan 03, 2014
  17. Sep 29, 2013
  18. Sep 23, 2013
  19. Sep 22, 2013
  20. Sep 20, 2013
  21. Sep 15, 2013
  22. Sep 01, 2013
  23. Aug 31, 2013
  24. Aug 30, 2013
  25. Aug 29, 2013
    • Matei Zaharia's avatar
      Change build and run instructions to use assemblies · 53cd50c0
      Matei Zaharia authored
      This commit makes Spark invocation saner by using an assembly JAR to
      find all of Spark's dependencies instead of adding all the JARs in
      lib_managed. It also packages the examples into an assembly and uses
      that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
      with two better-named scripts: "run-examples" for examples, and
      "spark-class" for Spark internal classes (e.g. REPL, master, etc). This
      is also designed to minimize the confusion people have in trying to use
      "run" to run their own classes; it's not meant to do that, but now at
      least if they look at it, they can modify run-examples to do a decent
      job for them.
      
      As part of this, Bagel's examples are also now properly moved to the
      examples package instead of bagel.
      53cd50c0
  26. Aug 28, 2013
  27. Aug 03, 2013
  28. Aug 02, 2013
  29. Jul 31, 2013
  30. Jul 24, 2013
  31. Jul 17, 2013
  32. Jul 16, 2013
  33. Jul 15, 2013
  34. Jun 28, 2013
  35. Jun 25, 2013
  36. Jun 24, 2013
  37. Jun 22, 2013
Loading