Skip to content
Snippets Groups Projects
  1. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      Author: Prashant Sharma <>
      Author: Patrick Wendell <>
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
  2. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      Author: Prashant Sharma <>
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
  3. Aug 02, 2014
    • Chris Fregly's avatar
      [SPARK-1981] Add AWS Kinesis streaming support · 91f9504e
      Chris Fregly authored
      Author: Chris Fregly <>
      Closes #1434 from cfregly/master and squashes the following commits:
      4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
      0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
      691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
      0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
      e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
      d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
      912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
      db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
      338997e [Chris Fregly] improve build docs for kinesis
      828f8ae [Chris Fregly] more cleanup
      e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      cd68c0d [Chris Fregly] fixed typos and backward compatibility
      d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
  4. Jul 03, 2014
    • Prashant Sharma's avatar
      [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b
      Prashant Sharma authored
      Trivial fix.
      Author: Prashant Sharma <>
      Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:
      77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
      13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
  5. Jun 08, 2014
    • maji2014's avatar
      Update run-example · e9261d08
      maji2014 authored
      Old code can only be ran under spark_home and use "bin/run-example".
       Error "./run-example: line 55: ./bin/spark-submit: No such file or directory" appears when running in other place. So change this
      Author: maji2014 <>
      Closes #1011 from maji2014/master and squashes the following commits:
      2cc1af6 [maji2014] Update run-example
      Closes #988.
  6. May 19, 2014
    • Matei Zaharia's avatar
      [SPARK-1876] Windows fixes to deal with latest distribution layout changes · 7b70a707
      Matei Zaharia authored
      - Look for JARs in the right place
      - Launch examples the same way as on Unix
      - Load datanucleus JARs if they exist
      - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
      - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)
      Author: Matei Zaharia <>
      Closes #819 from mateiz/win-fixes and squashes the following commits:
      d558f96 [Matei Zaharia] Fix comment
      228577b [Matei Zaharia] Review comments
      d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
      144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
  7. May 09, 2014
    • Patrick Wendell's avatar
      SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. · 06b15baa
      Patrick Wendell authored
      Gives a nicely formatted message to the user when `run-example` is run to
      tell them to use `spark-submit`.
      Author: Patrick Wendell <>
      Closes #704 from pwendell/examples and squashes the following commits:
      1996ee8 [Patrick Wendell] Feedback form Andrew
      3eb7803 [Patrick Wendell] Suggestions from TD
      2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
  8. Apr 23, 2014
    • Patrick Wendell's avatar
      SPARK-1119 and other build improvements · cd4ed293
      Patrick Wendell authored
      1. Makes assembly and examples jar naming consistent in maven/sbt.
      2. Updates to use Maven and fixes some bugs.
      3. Updates the create-release script to call make-distribution script.
      Author: Patrick Wendell <>
      Closes #502 from pwendell/make-distribution and squashes the following commits:
      1a97f0d [Patrick Wendell] SPARK-1119 and other build improvements
  9. Apr 21, 2014
    • Patrick Wendell's avatar
      Clean up and simplify Spark configuration · fb98488f
      Patrick Wendell authored
      Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
      1. Improved `` which was missing a lot of things users now set in that file.
      2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
      3. Adds ability to set these same variables for the driver using `spark-submit`.
      4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
      5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
      Author: Patrick Wendell <>
      Closes #299 from pwendell/config-cleanup and squashes the following commits:
      127f301 [Patrick Wendell] Improvements to testing
      a006464 [Patrick Wendell] Moving properties file template.
      b4b496c [Patrick Wendell] -> spark-defaults.conf
      0086939 [Patrick Wendell] Minor style fixes
      af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
      b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
      af0adf7 [Patrick Wendell] Automatically add user jar
      a56b125 [Patrick Wendell] Responses to Tom's review
      d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a762901 [Patrick Wendell] Fixing test failures
      ffa00fe [Patrick Wendell] Review feedback
      fda0301 [Patrick Wendell] Note
      308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
      e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
      be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
      c2a2909 [Patrick Wendell] Test compile fixes
      4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
      afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
      b08893b [Patrick Wendell] Additional improvements.
      ace4ead [Patrick Wendell] Responses to review feedback.
      b72d183 [Patrick Wendell] Review feedback for spark env file
      46555c1 [Patrick Wendell] Review feedback and import clean-ups
      437aed1 [Patrick Wendell] Small fix
      761ebcd [Patrick Wendell] Library path and classpath for drivers
      7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
      5b0ba8e [Patrick Wendell] Don't ship executor envs
      84cc5e5 [Patrick Wendell] Small clean-up
      1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
      4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
      6eaf7d0 [Patrick Wendell] executorJavaOpts
      0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
      ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
  10. Mar 25, 2014
    • Aaron Davidson's avatar
      SPARK-1286: Make usage of idempotent · 007a7334
      Aaron Davidson authored
      Various spark scripts load This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in less clear.
      One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of If we were to load the first and then set our command-line options, we could guarantee correct precedence order.
      Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/ Otherwise, we default to the ../conf/ as usual.
      Author: Aaron Davidson <>
      Closes #184 from aarondav/idem and squashes the following commits:
      e291f91 [Aaron Davidson] Use "private" variables in
      8da8360 [Aaron Davidson] Add .sh extension to
      93a2471 [Aaron Davidson] SPARK-1286: Make usage of idempotent
  11. Jan 21, 2014
  12. Jan 20, 2014
  13. Jan 07, 2014
  14. Jan 06, 2014
  15. Jan 03, 2014
  16. Jan 02, 2014
  17. Sep 29, 2013
  18. Sep 23, 2013
  19. Sep 22, 2013
  20. Sep 15, 2013
  21. Sep 11, 2013
  22. Aug 30, 2013
  23. Aug 29, 2013
    • Matei Zaharia's avatar
      Update Maven build to create assemblies expected by new scripts · 666d93c2
      Matei Zaharia authored
      This includes the following changes:
      - The "assembly" package now builds in Maven by default, and creates an
        assembly containing both hadoop-client and Spark, unlike the old
        BigTop distribution assembly that skipped hadoop-client
      - There is now a bigtop-dist package to build the old BigTop assembly
      - The repl-bin package is no longer built by default since the scripts
        don't reply on it; instead it can be enabled with -Prepl-bin
      - Py4J is now included in the assembly/lib folder as a local Maven repo,
        so that the Maven package can link to it
      - run-example now adds the original Spark classpath as well because the
        Maven examples assembly lists spark-core and such as provided
      - The various Maven projects add a spark-yarn dependency correctly
    • Matei Zaharia's avatar
      Change build and run instructions to use assemblies · 53cd50c0
      Matei Zaharia authored
      This commit makes Spark invocation saner by using an assembly JAR to
      find all of Spark's dependencies instead of adding all the JARs in
      lib_managed. It also packages the examples into an assembly and uses
      that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
      with two better-named scripts: "run-examples" for examples, and
      "spark-class" for Spark internal classes (e.g. REPL, master, etc). This
      is also designed to minimize the confusion people have in trying to use
      "run" to run their own classes; it's not meant to do that, but now at
      least if they look at it, they can modify run-examples to do a decent
      job for them.
      As part of this, Bagel's examples are also now properly moved to the
      examples package instead of bagel.