Commits · 731f683b1bd8abbb83030b6bae14876658bbf098 · cs525-sp18-g07 / spark

Jul 03, 2014

[SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b

Prashant Sharma authored 10 years ago

Trivial fix.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:

77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.

731f683b

Jun 23, 2014

[SPARK-2118] spark class should complain if tools jar is missing. · 6dc6722a

Prashant Sharma authored 10 years ago

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1068 from ScrapCodes/SPARK-2118/tools-jar-check and squashes the following commits:

29e768b [Prashant Sharma] Code Review
5cb6f7d [Prashant Sharma] [SPARK-2118] spark class should complaing if tools jar is missing.

6dc6722a

Jun 12, 2014

SPARK-1843: Replace assemble-deps with env variable. · 1c04652c

Patrick Wendell authored 10 years ago

(This change is actually small, I moved some logic into
compute-classpath that was previously in spark-class).

Assemble deps has existed for a while to allow developers to
run local code with new changes quickly. When I'm developing I
typically use a simpler approach which just prepends the Spark
classes to the classpath before the assembly jar. This is well
defined in the JVM and the Spark classes take precedence over those
in the assembly.

This approach is portable across both builds which is the main reason I'd
like to switch to it. It's also a bit easier to toggle on and off quickly.

The way you use this is the following:
```
$ ./bin/spark-shell # Use spark with the normal assembly
$ export SPARK_PREPEND_CLASSES=true
$ ./bin/spark-shell # Now it's using compiled classes
$ unset SPARK_PREPEND_CLASSES
$ ./bin/spark-shell # Back to normal
```

Author: Patrick Wendell <pwendell@gmail.com>

Closes #877 from pwendell/assemble-deps and squashes the following commits:

8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps
faa3168 [Patrick Wendell] Adding a warning for compatibility
3f151a7 [Patrick Wendell] Small fix
bbfb73c [Patrick Wendell] Review feedback
328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.

1c04652c

May 21, 2014

[SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class · 6e337380

Sumedh Mungee authored 10 years ago

Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark.

Author: Sumedh Mungee <smungee@gmail.com>

Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:

26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class

6e337380

May 19, 2014

SPARK-1879. Increase MaxPermSize since some of our builds have many classes · 5af99d76

Matei Zaharia authored 10 years ago

See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.

Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.

Author: Matei Zaharia <matei@databricks.com>

Closes #823 from mateiz/spark-1879 and squashes the following commits:

6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes

5af99d76

May 09, 2014

SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. · 06b15baa

Patrick Wendell authored 10 years ago

Gives a nicely formatted message to the user when `run-example` is run to
tell them to use `spark-submit`.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #704 from pwendell/examples and squashes the following commits:

1996ee8 [Patrick Wendell] Feedback form Andrew
3eb7803 [Patrick Wendell] Suggestions from TD
2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.

06b15baa

May 04, 2014

SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7. · 0c98a8f6

Patrick Wendell authored 10 years ago

This add some guards and good warning messages if users hit this issue. /cc @aarondav with whom I discussed parts of the design.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #627 from pwendell/jdk6 and squashes the following commits:

a38a958 [Patrick Wendell] Code review feedback
94e9f84 [Patrick Wendell] SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7.

0c98a8f6

Apr 28, 2014

SPARK-1654 and SPARK-1653: Fixes in spark-submit. · 949e3931

Patrick Wendell authored 10 years ago

Deals with two issues:
1. Spark shell didn't correctly pass quoted arguments to spark-submit.
```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
   which triggered warnings. Now we use new, more narrowly scoped,
   variables.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #576 from pwendell/spark-submit and squashes the following commits:

67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.

949e3931

Apr 21, 2014

Clean up and simplify Spark configuration · fb98488f

Patrick Wendell authored 10 years ago

Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:

1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #299 from pwendell/config-cleanup and squashes the following commits:

127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS

fb98488f

Apr 10, 2014

[SPARK-1276] Add a HistoryServer to render persisted UI · 79820fe8

Andrew Or authored 10 years ago

The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.

This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.

To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.

Comments and feedback are most welcome.

---

A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.

A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.

Author: Andrew Or <andrewor14@gmail.com>

Closes #204 from andrewor14/master and squashes the following commits:

7b7234c [Andrew Or] Finished -> Completed
b158d98 [Andrew Or] Address Patrick's comments
69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
19d5dd0 [Andrew Or] Merge github.com:apache/spark
f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
2dfb494 [Andrew Or] Decouple checking for application completion from replaying
d02dbaa [Andrew Or] Expose Spark version and include it in event logs
2282300 [Andrew Or] Add documentation for the HistoryServer
567474a [Andrew Or] Merge github.com:apache/spark
6edf052 [Andrew Or] Merge github.com:apache/spark
19e1fb4 [Andrew Or] Address Thomas' comments
248cb3d [Andrew Or] Limit number of live applications + add configurability
a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
bc46fc8 [Andrew Or] Merge github.com:apache/spark
e2f4ff9 [Andrew Or] Merge github.com:apache/spark
050419e [Andrew Or] Merge github.com:apache/spark
81b568b [Andrew Or] Fix strange error messages...
0670743 [Andrew Or] Decouple page rendering from loading files from disk
1b2f391 [Andrew Or] Minor changes
a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
d5154da [Andrew Or] Styling and comments
5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
7584418 [Andrew Or] Report application start/end times to HistoryServer
8aac163 [Andrew Or] Add basic application table
c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface

79820fe8

Apr 06, 2014

SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging · 41065584

Aaron Davidson authored 10 years ago

Previously, we based our decision regarding including datanucleus jars based on the existence of a spark-hive-assembly jar, which was incidentally built whenever "sbt assembly" is run. This means that a typical and previously supported pathway would start using hive jars.

This patch has the following features/bug fixes:

- Use of SPARK_HIVE (default false) to determine if we should include Hive in the assembly jar.
- Analagous feature in Maven with -Phive (previously, there was no support for adding Hive to any of our jars produced by Maven)
- assemble-deps fixed since we no longer use a different ASSEMBLY_DIR
- avoid adding log message in compute-classpath.sh to the classpath :)

Still TODO before mergeable:
- We need to download the datanucleus jars outside of sbt. Perhaps we can have spark-class download them if SPARK_HIVE is set similar to how sbt downloads itself.
- Spark SQL documentation updates.

Author: Aaron Davidson <aaron@databricks.com>

Closes #237 from aarondav/master and squashes the following commits:

5dc4329 [Aaron Davidson] Typo fixes
dd4f298 [Aaron Davidson] Doc update
dd1a365 [Aaron Davidson] Eliminate need for SPARK_HIVE at runtime by d/ling datanucleus from Maven
a9269b5 [Aaron Davidson] [WIP] Use SPARK_HIVE to determine if we include Hive in packaging

41065584

Mar 25, 2014

SPARK-1286: Make usage of spark-env.sh idempotent · 007a7334

Aaron Davidson authored 11 years ago

Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.

One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.

Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.

Author: Aaron Davidson <aaron@databricks.com>

Closes #184 from aarondav/idem and squashes the following commits:

e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent

007a7334

Mar 24, 2014

SPARK-1094 Support MiMa for reporting binary compatibility accross versions. · dc126f21

Patrick Wendell authored 11 years ago

This adds some changes on top of the initial work by @scrapcodes in #20:

The goal here is to do automated checking of Spark commits to determine whether they break binary compatibility.

1. Special case for inner classes of package-private objects.
2. Made tools classes accessible when running `spark-class`.
3. Made some declared types in MLLib more general.
4. Various other improvements to exclude-generation script.
5. In-code documentation.

Author: Patrick Wendell <pwendell@gmail.com>
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

Closes #207 from pwendell/mima and squashes the following commits:

22ae267 [Patrick Wendell] New binary changes after upmerge
6c2030d [Patrick Wendell] Merge remote-tracking branch 'apache/master' into mima
3666cf1 [Patrick Wendell] Minor style change
0e0f570 [Patrick Wendell] Small fix and removing directory listings
647c547 [Patrick Wendell] Reveiw feedback.
c39f3b5 [Patrick Wendell] Some enhancements to binary checking.
4c771e0 [Prashant Sharma] Added a tool to generate mima excludes and also adapted build to pick automatically.
b551519 [Prashant Sharma] adding a new exclude after rebasing with master
651844c [Prashant Sharma] Support MiMa for reporting binary compatibility accross versions.

dc126f21

Mar 09, 2014

SPARK-929: Fully deprecate usage of SPARK_MEM · 52834d76

Aaron Davidson authored 11 years ago

(Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)

This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY

The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.

SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.

SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.

Other memory considerations:
- The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
- run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).

This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.

Author: Aaron Davidson <aaron@databricks.com>

Closes #99 from aarondav/sparkmem and squashes the following commits:

9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM

52834d76

Jan 06, 2014
- Finish documentation changes · 7d0094bb
  Holden Karau authored 11 years ago
  
  7d0094bb
Jan 03, 2014
- sbin/compute-classpath* bin/compute-classpath* · 9ae382c3
  Prashant Sharma authored 11 years ago
  
  9ae382c3
- sbin/spark-class* -> bin/spark-class* · 74ba97fc
  Prashant Sharma authored 11 years ago
  
  74ba97fc
Sep 29, 2013
- refactor $FWD variable · cc37b315
  Andrew xia authored 11 years ago
  
  cc37b315
Sep 23, 2013
- add scripts in bin · 1d53792a
  shane-huang authored 11 years ago
  
  Signed-off-by: shane-huang <shengsheng.huang@intel.com>
  1d53792a
Sep 22, 2013
- added spark-class and spark-executor to sbin · dfbdc9dd
  shane-huang authored 11 years ago
  
  Signed-off-by: shane-huang <shengsheng.huang@intel.com>
  dfbdc9dd
Sep 20, 2013
- Add "org.apache." prefix to packages in spark-class · 8933f9e9
  Aaron Davidson authored 11 years ago
  
  Lacking this, the if/case statements never trigger on Spark 0.8.0+.
  8933f9e9
Sep 15, 2013
- Fixed repl suite · 20c65bc3
  Prashant Sharma authored 11 years ago
  
  20c65bc3
Sep 01, 2013
- Run script fixes for Windows after package & assembly change · 3db404a4
  Matei Zaharia authored 11 years ago
  
  3db404a4
Aug 31, 2013
- Delete some code that was added back in a merge and print less info in · 89a20b83
  Matei Zaharia authored 11 years ago
  
  spark-daemon
  89a20b83
Aug 30, 2013
- More doc improvements + better warnings when you haven't built Spark · f3a96484
  Matei Zaharia authored 11 years ago
  
  f3a96484
Aug 29, 2013

Change build and run instructions to use assemblies · 53cd50c0

Matei Zaharia authored 11 years ago

This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.

53cd50c0

Aug 28, 2013
- Adding extra args · 1798e69e
  Patrick Wendell authored 11 years ago
  
  1798e69e
- Hot fix for command runner · 2fc9a028
  Patrick Wendell authored 11 years ago
  
  2fc9a028
Aug 03, 2013
- Make output formatting consistent between bash/scala · f3660d5a
  Patrick Wendell authored 11 years ago
  
  f3660d5a
Aug 02, 2013

Log the launch command for Spark daemons · b4905c38

Patrick Wendell authored 11 years ago

For debugging and analysis purposes, it's nice to have the exact command
used to launch Spark contained within the logs. This adds the necessary
hooks to make that possible.

b4905c38

Jul 31, 2013
- Do not try and use 'scala' in 'run' from within a "release". · 529ac811
  Benjamin Hindman authored 11 years ago
  
  529ac811
Jul 24, 2013
- Fix setting of SPARK_EXAMPLES_JAR · 1d101928
  Jey Kottalam authored 11 years ago
  
  1d101928
Jul 17, 2013
- Consistently invoke bash with /usr/bin/env bash in scripts to make code more... · 88a0823c
  Ubuntu authored 11 years ago
  
  Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable (JIRA Ticket SPARK-817)
  88a0823c
Jul 16, 2013
- Add Apache license headers and LICENSE and NOTICE files · af3c9d50
  Matei Zaharia authored 11 years ago
  
  af3c9d50
Jul 15, 2013
- Fixed previously left out merge conflict in scripts · 27250914
  Prashant Sharma authored 11 years ago
  
  27250914
Jun 28, 2013
- Look at JAVA_HOME before PATH to determine Java executable · 4974b658
  Matei Zaharia authored 11 years ago
  
  4974b658
Jun 25, 2013

Fix computation of classpath when we launch java directly · 6c8d1b2c

Matei Zaharia authored 11 years ago

The previous version assumed that a CLASSPATH environment variable was
set by the "run" script when launching the process that starts the
ExecutorRunner, but unfortunately this is not true in tests. Instead, we
factor the classpath calculation into an extenral script and call that.

NOTE: This includes a Windows version but hasn't yet been tested there.

6c8d1b2c

Get rid of debugging statements · c3d11d0d
Evan Chan authored 11 years ago

c3d11d0d

Jun 24, 2013
- Split out source distro CLASSPATH logic to a separate script · 0bcaf036
  Evan Chan authored 11 years ago
  
  0bcaf036
Jun 22, 2013
- Fix resolution of example code with Maven builds · d92d3f79
  Matei Zaharia authored 11 years ago
  
  d92d3f79