Commits · e85af507671d417724c28ee2db499fc019feb1d8 · cs525-sp18-g07 / spark

Dec 10, 2013
- Leave default value of numPartitions to Scala code. · e85af507
  Binh Nguyen authored 11 years ago
  
  e85af507
- Use braces to shorten the line. · c82d4f07
  Binh Nguyen authored 11 years ago
  
  c82d4f07
- Expose numPartitions parameter in JavaPairRDD.sortByKey() · 5013fb64
  Binh Nguyen authored 11 years ago
  
  This change make Java and Scala API on sortByKey() the same.
  5013fb64
Dec 09, 2013
- Merge pull request #246 from pwendell/master · 6169fe14
  Patrick Wendell authored 11 years ago
  
  Add missing license headers I found this when doing further audits on the 0.8.1 release candidate.
  6169fe14
- License headers · 5b74609d
  Patrick Wendell authored 11 years ago
  
  5b74609d
Dec 08, 2013
- Merge pull request #195 from dhardy92/fix_DebScriptPackage · d992ec6d
  Patrick Wendell authored 11 years ago
  
  [Deb] fix package of Spark classes adding org.apache prefix in scripts embeded in .deb
  d992ec6d
- Merge pull request #242 from pwendell/master · 1f4a4bcc
  Patrick Wendell authored 11 years ago
  
  Update broken links and add HDP 2.0 version string I ran a link checker on the UI and found several broken links.
  1f4a4bcc
- Small fix · 0428145e
  Patrick Wendell authored 11 years ago
  
  0428145e
- Adding HDP 2.0 version · b3e87c0f
  Patrick Wendell authored 11 years ago
  
  b3e87c0f
- Various broken links in documentation · 41c60b33
  Patrick Wendell authored 11 years ago
  
  41c60b33
Dec 07, 2013
- Merge pull request #240 from pwendell/master · 6494d62f
  Patrick Wendell authored 11 years ago
  
  SPARK-917 Improve API links in nav bar
  6494d62f
- Merge pull request #239 from aarondav/nit · f466f79b
  Patrick Wendell authored 11 years ago
  
  Correct spellling error in configuration.md
  f466f79b
- SPARK-917 Improve API links in nav bar · dd331a6b
  Patrick Wendell authored 11 years ago
  
  dd331a6b
- Correct spellling error in configuration.md · cb6ac8aa
  Aaron Davidson authored 11 years ago
  
  cb6ac8aa
Dec 06, 2013

Merge pull request #237 from pwendell/formatting-fix · 10c3c0c6

Patrick Wendell authored 11 years ago

Formatting fix

This is a single-line change. The diff appears larger here due to github being out of sync.

10c3c0c6

Minor formatting fix in config file · 7a1d1c93
Patrick Wendell authored 11 years ago

7a1d1c93
Merge pull request #236 from pwendell/shuffle-docs · 1b38f5f2
Patrick Wendell authored 11 years ago
```
Adding disclaimer for shuffle file consolidation
```
1b38f5f2
Merge pull request #235 from pwendell/master · e5d5728b
Patrick Wendell authored 11 years ago
```
Minor doc fixes and updating README
```
e5d5728b
Adding disclaimer for shuffle file consolidation · b9451acd
Patrick Wendell authored 11 years ago

b9451acd
Minor doc fixes and updating README · bb6e25c6
Patrick Wendell authored 11 years ago

bb6e25c6
Merge pull request #234 from alig/master · 241336ad
Patrick Wendell authored 11 years ago
```
Updated documentation about the YARN v2.2 build process
```
241336ad
more docs · e2c2914f
Ali Ghodsi authored 11 years ago

e2c2914f
Updated documentation about the YARN v2.2 build process · f2fb4b42
Ali Ghodsi authored 11 years ago

f2fb4b42

Merge pull request #190 from markhamstra/Stages4Jobs · e0392343

Matei Zaharia authored 11 years ago

stageId <--> jobId mapping in DAGScheduler

Okay, I think this one is ready to go -- or at least it's ready for review and discussion. It's a carry-over of https://github.com/mesos/spark/pull/842 with updates for the newer job cancellation functionality. The prior discussion still applies. I've actually changed the job cancellation flow a bit: Instead of ``cancelTasks`` going to the TaskScheduler and then ``taskSetFailed`` coming back to the DAGScheduler (resulting in ``abortStage`` there), the DAGScheduler now takes care of figuring out which stages should be cancelled, tells the TaskScheduler to cancel tasks for those stages, then does the cleanup within the DAGScheduler directly without the need for any further prompting by the TaskScheduler.

I know of three outstanding issues, each of which can and should, I believe, be handled in follow-up pull requests:

1) https://spark-project.atlassian.net/browse/SPARK-960
2) JobLogger should be re-factored to eliminate duplication
3) Related to 2), the WebUI should also become a consumer of the DAGScheduler's new understanding of the relationship between jobs and stages so that it can display progress indication and the like grouped by job. Right now, some of this information is just being sent out as part of ``SparkListenerJobStart`` messages, but more or different job <--> stage information may need to be exported from the DAGScheduler to meet listeners needs.

Except for the eventQueue -> Actor commit, the rest can be cherry-picked almost cleanly into branch-0.8. A little merging is needed in MapOutputTracker and the DAGScheduler. Merged versions of those files are in https://github.com/markhamstra/incubator-spark/tree/aba2b40ce04ee9b7b9ea260abb6f09e050142d43

Note that between the recent Actor change in the DAGScheduler and the cleaning up of DAGScheduler data structures on job completion in this PR, some races have been introduced into the DAGSchedulerSuite. Those tests usually pass, and I don't think that better-behaved code that doesn't directly inspect DAGScheduler data structures should be seeing any problems, but I'll work on fixing DAGSchedulerSuite as either an addition to this PR or as a separate request.

UPDATE: Fixed the race that I introduced. Created a JIRA issue (SPARK-965) for the one that was introduced with the switch to eventProcessorActor in the DAGScheduler.

e0392343

Merge pull request #233 from hsaputra/changecontexttobackend · bfa68609

Matei Zaharia authored 11 years ago

Change the name of input argument in ClusterScheduler#initialize from context to backend.

The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small
change of the input param in the ClusterScheduler#initialize to reflect this.

bfa68609

Merge pull request #205 from kayousterhout/logging · 3fb302c0

Matei Zaharia authored 11 years ago

Added logging of scheduler delays to UI

This commit adds two metrics to the UI:

1) The time to get task results, if they're fetched remotely

2) The scheduler delay. When the scheduler starts getting overwhelmed (because it can't keep up with the rate at which tasks are being submitted), the result is that tasks get delayed on the tail-end: the message from the worker saying that the task has completed ends up in a long queue and takes a while to be processed by the scheduler. This commit records that delay in the UI so that users can tell when the scheduler is becoming the bottleneck.

3fb302c0

Merge pull request #220 from rxin/zippart · 87676a6a

Matei Zaharia authored 11 years ago


Memoize preferred locations in ZippedPartitionsBaseRDD

so preferred location computation doesn't lead to exponential explosion.

This was a problem in GraphX where we have a whole chain of RDDs that are ZippedPartitionsRDD's, and the preferred locations were taking eternity to compute.

(cherry picked from commit e36fe55a)
Signed-off-by: Reynold Xin <rxin@apache.org>

87676a6a

Merge pull request #232 from markhamstra/FiniteWait · 07804987

Reynold Xin authored 11 years ago

jobWaiter.synchronized before jobWaiter.wait

...else ``IllegalMonitorStateException`` in ``SimpleFutureAction#ready``.

07804987

FutureAction result tests · ee888f6b
Mark Hamstra authored 11 years ago

ee888f6b

Dec 05, 2013

Change the name of input ragument in ClusterScheduler#initialize from context to backend. · 1cb259cb

Henry Saputra authored 11 years ago

The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small
change of the input param in the ClusterScheduler#initialize to reflect this.

1cb259cb

jobWaiter.synchronized before jobWaiter.wait · aebb123f
Mark Hamstra authored 11 years ago

aebb123f
Merge pull request #228 from pwendell/master · 5d460253
Patrick Wendell authored 11 years ago
```
Document missing configs and set shuffle consolidation to false.
```
5d460253
Forcing shuffle consolidation in DiskBlockManagerSuite · 75d161b3
Patrick Wendell authored 11 years ago

75d161b3

Merge pull request #199 from harveyfeng/yarn-2.2 · 72b69615

Matei Zaharia authored 11 years ago

Hadoop 2.2 migration

Includes support for the YARN API stabilized in the Hadoop 2.2 release, and a few style patches.

Short description for each set of commits:

a98f5a0e - "Misc style changes in the 'yarn' package"
a67ebf43 - "A few more style fixes in the 'yarn' package"
Both of these are some minor style changes, such as fixing lines over 100 chars, to the existing YARN code.

ab8652f2 - "Add a 'new-yarn' directory ... "
Copies everything from `SPARK_HOME/yarn` to `SPARK_HOME/new-yarn`. No actual code changes here.

4f1c3fa5 - "Hadoop 2.2 YARN API migration ..."
API patches to code in the `SPARK_HOME/new-yarn` directory. There are a few more small style changes mixed in, too.
Based on @colorant's Hadoop 2.2 support for the scala-2.10 branch in #141.

a1a1c62a - "Add optional Hadoop 2.2 settings in sbt build ... "
If Spark should be built against Hadoop 2.2, then:
a) the `org.apache.spark.deploy.yarn` package will be compiled from the `new-yarn` directory.
b) Protobuf v2.5 will be used as a Spark dependency, since Hadoop 2.2 depends on it. Also, Spark will be built against a version of Akka v2.0.5 that's built against Protobuf 2.5, named `akka-2.0.5-protobuf-2.5`. The patched Akka is here: https://github.com/harveyfeng/akka/tree/2.0.5-protobuf-2.5, and was published to local Ivy during testing.

There's also a new boolean environment variable, `SPARK_IS_NEW_HADOOP`, that users can manually set if their `SPARK_HADOOP_VERSION` specification does not start with `2.2`, which is how the build file tries to detect a 2.2 version. Not sure if this is necessary or done in the best way, though...

72b69615

Dec 04, 2013
- Small changes from Matei review · 1450b8ef
  Patrick Wendell authored 11 years ago
  
  1450b8ef
- Document missing configs and set shuffle consolidation to false. · b1c6fa15
  Patrick Wendell authored 11 years ago
  
  b1c6fa15
- Merge pull request #227 from pwendell/master · 182f9bae
  Patrick Wendell authored 11 years ago
  
  Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.
  182f9bae
- Merge pull request #225 from ash211/patch-3 · b9e7609f
  Reynold Xin authored 11 years ago
  
  Add missing space after "Serialized" in StorageLevel Current code creates outputs like: scala> res0.getStorageLevel.description res2: String = Serialized1x Replicated
  b9e7609f
- Fix small bug in web UI and minor clean-up. · 380b90b9
  Patrick Wendell authored 11 years ago
  
  There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.
  380b90b9
- Merge pull request #226 from ash211/patch-4 · 055462c1
  Reynold Xin authored 11 years ago
  
  Typo: applicaton
  055462c1