Commits · 7b58f116e5a028336e25a26daae5852c95e56340 · cs525-sp18-g07 / spark

Jan 10, 2014

Merge pull request #384 from pwendell/debug-logs · 7b58f116

Patrick Wendell authored 11 years ago

Make DEBUG-level logs consummable.

Removes two things that caused issues with the debug logs:

(a) Internal polling in the DAGScheduler was polluting the logs.
(b) The Scala REPL logs were really noisy.

7b58f116

Make DEBUG-level logs consummable. · e9ed2d9e

Patrick Wendell authored 11 years ago

Removes two things that caused issues with the debug logs:

(a) Internal polling in the DAGScheduler was polluting the logs.
(b) The Scala REPL logs were really noisy.

e9ed2d9e

Merge pull request #375 from mateiz/option-fix · 0ebc9730

Matei Zaharia authored 11 years ago

Fix bug added when we changed AppDescription.maxCores to an Option

The Scala compiler warned about this -- we were comparing an Option against an integer now.

0ebc9730

Merge pull request #378 from pwendell/consolidate_on · dd03cea0
Patrick Wendell authored 11 years ago
```
Enable shuffle consolidation by default.

Bump this to being enabled for 0.9.0.
```
dd03cea0
Enable shuffle consolidation by default. · 460f655c
Patrick Wendell authored 11 years ago
```
Bump this to being enabled for 0.9.0.
```
460f655c

Merge pull request #363 from pwendell/streaming-logs · 997c830e

Patrick Wendell authored 11 years ago

Set default logging to WARN for Spark streaming examples.

This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.

997c830e

Jan 09, 2014

Minor clean-up · 7b748b83
Patrick Wendell authored 11 years ago

7b748b83

Merge pull request #353 from pwendell/ipython-simplify · 300eaa99

Patrick Wendell authored 11 years ago

Simplify and fix pyspark script.

This patch removes compatibility for IPython < 1.0 but fixes the launch
script and makes it much simpler.

I tested this using the three commands in the PySpark documentation page:

1. IPYTHON=1 ./pyspark
2. IPYTHON_OPTS="notebook" ./pyspark
3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark

There are two changes:
- We rely on PYTHONSTARTUP env var to start PySpark
- Removed the quotes around $IPYTHON_OPTS... having quotes
  gloms them together as a single argument passed to `exec` which
  seemed to cause ipython to fail (it instead expects them as
  multiple arguments).

300eaa99

Merge pull request #374 from mateiz/completeness · 4b074fac

Reynold Xin authored 11 years ago

Add some missing Java API methods

These are primarily for setting job groups, canceling jobs, and setting names on RDDs. Seemed like useful stuff to expose in Java.

4b074fac

Merge pull request #294 from RongGu/master · a9d53333

Reynold Xin authored 11 years ago

Bug fixes for updating the RDD block's memory and disk usage information

Bug fixes for updating the RDD block's memory and disk usage information.
From the code context, we can find that the memSize and diskSize here are both always equal to the size of the block. Actually, they never be zero. Thus, the logic here is wrong for recording the block usage in BlockStatus, especially for the blocks which are dropped from memory to ensure space for the new input rdd blocks. I have tested it that this would cause the storage metrics shown in the Storage webpage wrong and misleading. With this patch, the metrics will be okay.
Finally, Merry Christmas, guys:)

a9d53333

Small fix suggested by josh · 77ca9e1b
Patrick Wendell authored 11 years ago

77ca9e1b

Merge pull request #293 from pwendell/standalone-driver · d86a85e9

Patrick Wendell authored 11 years ago

SPARK-998: Support Launching Driver Inside of Standalone Mode

[NOTE: I need to bring the tests up to date with new changes, so for now they will fail]

This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI

There are a few small TODO's here, but the code is generally feature-complete. They are:
- Bring tests up to date and add test coverage
- Restarting on failure should be optional and maybe off by default.
- See if we can re-use akka connections to facilitate clients behind a firewall

A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures.

Instructions to test locally:
- `sbt/sbt assembly/assembly examples/assembly`
- start a local version of the standalone cluster manager

```
./spark-class org.apache.spark.deploy.client.DriverClient \
  -j -Dspark.test.property=something \
  -e SPARK_TEST_KEY=SOMEVALUE \
  launch spark://10.99.1.14:7077 \
  ../path-to-examples-assembly-jar \
  org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13
```
- Go in the UI and make sure it started correctly, look at the output etc
- Kill workers, the driver program, masters, etc.

d86a85e9

Fix bug added when we changed AppDescription.maxCores to an Option · c43eb006
Matei Zaharia authored 11 years ago
```
The Scala compiler warned about this -- we were comparing an Option
against an integer now.
```
c43eb006
Add some missing Java API methods · 142921c6
Matei Zaharia authored 11 years ago

142921c6
Merge pull request #372 from pwendell/log4j-fix-1 · 26cdb5f6
Patrick Wendell authored 11 years ago
```
Send logs to stderr by default (instead of stdout).
```
26cdb5f6
Send logs to stderr by default (instead of stdout). · 2af98198
Patrick Wendell authored 11 years ago

2af98198
Merge pull request #362 from mateiz/conf-getters · 12f414ed
Matei Zaharia authored 11 years ago
```
Use typed getters for configuration settings

This improves some of the code style after SPARK-544.
```
12f414ed
Some usability improvements · 67b9a336
Patrick Wendell authored 11 years ago

67b9a336

Set default logging to WARN for Spark streaming examples. · 35f80da2

Patrick Wendell authored 11 years ago

This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.

35f80da2

Merge pull request #361 from rxin/clean · 365cac94

Reynold Xin authored 11 years ago

Minor style cleanup. Mostly on indenting & line width changes.

Focused on the few important files since they are the files that new contributors usually read first.

365cac94

Merge pull request #368 from pwendell/sbt-fix · 73c724e1

Reynold Xin authored 11 years ago

Don't delegate to users `sbt`.

This changes our `sbt/sbt` script to not delegate to the user's `sbt`
even if it is present. If users already have sbt installed and they
want to use their own sbt, we'd expect them to just call sbt directly
from within Spark. We no longer set any enironment variables or anything
from this script, so they should just launch sbt directly on their own.

There are a number of hard-to-debug issues which can come from the
current appraoch. One is if the user is unaware of an existing sbt
installation and now without explanation their build breaks because
they haven't configured options correctly (such as permgen size)
within their sbt (reported by @patmcdonough). Another is if the user has a much older version
of sbt hanging around, in which case some of the older versions
don't acutally work well when newer verisons of sbt are specified
in the build file (reported by @marmbrus). A third is if the user
has done some other modification to their sbt script, such as
setting it to delegate to sbt/sbt in Spark, and this causes
that to break (also reported by @marmbrus).

So to keep things simple let's just avoid this path and
remove it. Any user who already has sbt and wants to build
spark with it should be able to understand easily how to do it.

73c724e1

Minor update on SparkContext.broadcast's JavaDoc. · 295d8258
Reynold Xin authored 11 years ago

295d8258
Small typo fix · 49cbf48b
Patrick Wendell authored 11 years ago

49cbf48b
Use typed getters for configuration settings · a01f3401
Matei Zaharia authored 11 years ago

a01f3401

Don't delegate to users `sbt`. · 4d2e388e

Patrick Wendell authored 11 years ago

There are a number of hard-to-debug issues which can come from the
current appraoch. One is if the user is unaware of an existing sbt
installation and now without explanation their build breaks because
they haven't configured options correctly (such as permgen size)
within their sbt. Another is if the user has a much older version
of sbt hanging around, in which case some of the older versions
don't acutally work well when newer verisons of sbt are specified
in the build file (reported by @marmbrus). A third is if the user
has done some other modification to their sbt script, such as
setting it to delegate to sbt/sbt in Spark, and this causes
that to break (also reported by @marmbrus).

So to keep things simple let's just avoid this path and
remove it. Any user who already has sbt and wants to build
spark with it should be able to understand easily how to do it.

4d2e388e

Merge pull request #364 from pwendell/fix · dceedb46

Patrick Wendell authored 11 years ago

Fixing config option "retained_stages" => "retainedStages".

This is a very esoteric option and it's out of sync with the style we use.
So it seems fitting to fix it for 0.9.0.

dceedb46

Jan 08, 2014
- Fixing config option "retained_stages" => "retainedStages". · 112c0a17
  Patrick Wendell authored 11 years ago
  
  This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.
  112c0a17
- Adding polling to driver submission client. · 0f9d2ace
  Patrick Wendell authored 11 years ago
  
  0f9d2ace
- Minor style cleanup. Mostly on indenting & line width changes. · 46f6a3b6
  Reynold Xin authored 11 years ago
  
  46f6a3b6
- Merge pull request #360 from witgo/master · 04d83fc3
  Reynold Xin authored 11 years ago
  
  fix make-distribution.sh show version: command not found
  04d83fc3
- Merge pull request #357 from hsaputra/set_boolean_paramname · 56ebfeaa
  Reynold Xin authored 11 years ago
  
  Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.
  56ebfeaa
- Merge pull request #358 from pwendell/add-cdh · bdeaeafb
  Patrick Wendell authored 11 years ago
  
  Add CDH Repository to Maven Build At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.
  bdeaeafb
- Merge pull request #356 from hsaputra/remove_deprecated_cleanup_method · 5cae05f5
  Reynold Xin authored 11 years ago
  
  Remove calls to deprecated mapred's OutputCommitter.cleanupJob Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob, Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); }
  5cae05f5
- Merge remote branch 'upstream/master' · d942f95d
  walker authored 11 years ago
  
  d942f95d
- fix make-distribution.sh show version: command not found · cf4aaf92
  liguoqiang authored 11 years ago
  
  cf4aaf92
- Merge pull request #345 from colorant/yarn · 6eef78d7
  Thomas Graves authored 11 years ago
  
  support distributing extra files to worker for yarn client mode So that user doesn't need to package all dependency into one assemble jar as spark app jar
  6eef78d7
- Add CDH Repository to Maven Build · 3209a86f
  Patrick Wendell authored 11 years ago
  
  3209a86f
- Adding mockito to maven build · 62b08faa
  Patrick Wendell authored 11 years ago
  
  62b08faa
- Merge remote-tracking branch 'apache-github/master' into standalone-driver · bc81ce04
  Patrick Wendell authored 11 years ago
  
  Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml
  bc81ce04
- Resolve PR review over 100 chars · aa56585d
  Henry Saputra authored 11 years ago
  
  aa56585d