Commits · 14fcef72db765d0313d4ce3c986c08069a1a01ae · cs525-sp18-g07 / spark

Dec 25, 2013
- Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn · 14fcef72
  liguoqiang authored 11 years ago
  
  14fcef72
- Merge pull request #290 from ash211/patch-3 · 56094bcd
  Matei Zaharia authored 11 years ago
  
  Typo: avaiable -> available
  56094bcd
- Merge pull request #287 from azuryyu/master · 4842a07d
  Reynold Xin authored 11 years ago
  
  Fixed job name in the java streaming example.
  4842a07d
Dec 24, 2013

Typo: avaiable -> available · 3665c722
Andrew Ash authored 11 years ago

3665c722

Merge pull request #127 from kayousterhout/consolidate_schedulers · 85a344b4

Patrick Wendell authored 11 years ago

Deduplicate Local and Cluster schedulers.

The code in LocalScheduler/LocalTaskSetManager was nearly identical
to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy
made making updating the schedulers unnecessarily painful and error-
prone. This commit combines the two into a single TaskScheduler/
TaskSetManager.

Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with
TaskSetManager.scala.

Thanks @rxin for suggesting this change!

85a344b4

Merge pull request #279 from aarondav/shuffle-cleanup0 · c2dd6bcd

Patrick Wendell authored 11 years ago

Clean up shuffle files once their metadata is gone

Previously, we would only clean the in-memory metadata for consolidated shuffle files.

Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.

c2dd6bcd

Responded to Reynold's style comments · 1efe3adf
Kay Ousterhout authored 11 years ago

1efe3adf

Merge pull request #275 from ueshin/wip/changeclasspathorder · 3bf7c708

Matei Zaharia authored 11 years ago

Change the order of CLASSPATH.

SPARK_TOOLS_JAR should be placed after CLASSPATH or at least after
SPARK_CLASSPATH.

If SPARK_TOOLS_JAR is placed before CLASSPATH, all assembled classes and
resources in spark-tools-assembly.jar beat those in CLASSPATH or
SPARK_CLASSPATH, which might be replaced by customized versions.

3bf7c708

Make App report interval configurable during 'run on Yarn' · 66b7bea7
azuryyu authored 11 years ago

66b7bea7
Fixed job name in the java streaming example. · a8bb8638
azuryyu authored 11 years ago

a8bb8638
Merge pull request #286 from rxin/build · d63856c3
Reynold Xin authored 11 years ago
```
Show full stack trace and time taken in unit tests.
```
d63856c3

Dec 23, 2013

Show full stack trace and time taken in unit tests. · fc80b2e6
Reynold Xin authored 11 years ago

fc80b2e6

Merge pull request #277 from tdas/scheduler-update · 23a9ae6b

Matei Zaharia authored 11 years ago

Refactored the streaming scheduler and added StreamingListener interface

- Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package.
- Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it.
- Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.

23a9ae6b

Minor change for PR 277. · 6eaa0505
Tathagata Das authored 11 years ago

6eaa0505
Minor formatting fixes. · f9771690
Tathagata Das authored 11 years ago

f9771690
Added comments to BatchInfo and JobSet, based on Patrick's comment on PR 277. · dc3ee6b6
Tathagata Das authored 11 years ago

dc3ee6b6
Merge pull request #244 from leftnoteasy/master · 11107c9d
Reynold Xin authored 11 years ago
```
Added SPARK-968 implementation for review

Added SPARK-968 implementation for review
```
11107c9d
SPARK-968, added executor address showing in aggregated metrics by executors table · 2f689ba9
wangda.tan authored 11 years ago

2f689ba9

Dec 22, 2013
- Correctly merged in maxTaskFailures fix · b7bfae1a
  Kay Ousterhout authored 11 years ago
  
  b7bfae1a
- added changes according to comments from rxin · c979eecd
  wangda.tan authored 11 years ago
  
  c979eecd
- Fix build error in test · b8ae096a
  Kay Ousterhout authored 11 years ago
  
  b8ae096a
Dec 20, 2013
- Minor updated based on comments on PR 277. · 3ddbdbfb
  Tathagata Das authored 11 years ago
  
  3ddbdbfb
- Renamed ClusterScheduler to TaskSchedulerImpl · 30186aa2
  Kay Ousterhout authored 11 years ago
  
  30186aa2
- Merge remote branch 'upstream/master' into consolidate_schedulers · c06945cf
  Kay Ousterhout authored 11 years ago
  
  Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
  c06945cf
- Merge pull request #280 from aarondav/minor · 0bc57c57
  Patrick Wendell authored 11 years ago
  
  Minor cleanup for standalone scheduler See commit messages
  0bc57c57
Dec 19, 2013

Merge pull request #1 from aarondav/127 · 9228ec84
Kay Ousterhout authored 11 years ago
```
Merge master into 127
```
9228ec84

Merge pull request #272 from tmyklebu/master · eca68d44

Patrick Wendell authored 11 years ago

Track and report task result serialisation time.

 - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value.
 - DirectTaskResult now has a member function T value() that deserialises valueBytes.
 - Executor serialises value into a ByteBuffer and passes it to DTR's ctor.
 - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics.
 - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.

eca68d44

Fix compiler warning in SparkZooKeeperSession · 6613ab66
Aaron Davidson authored 11 years ago

6613ab66
Remove firstApp from the standalone scheduler Master · 4d74b899
Aaron Davidson authored 11 years ago
```
As a lonely child with no one to care for it... we had to put it down.
```
4d74b899
Extraordinarily minor code/comment cleanup · 1ab031ea
Aaron Davidson authored 11 years ago

1ab031ea

Clean up shuffle files once their metadata is gone · 0647ec97

Aaron Davidson authored 11 years ago

Previously, we would only clean the in-memory metadata for consolidated
shuffle files.

Additionally, fixes a bug where the Metadata Cleaner was ignoring type-
specific TTLs.

0647ec97

Merge pull request #276 from shivaram/collectPartition · 7990c563

Reynold Xin authored 11 years ago

Add collectPartition to JavaRDD interface.

This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py.

Thanks @concretevitamin for the original change and tests.

7990c563

Add comment explaining collectPartitions's use · 9cc3a6d3
Shivaram Venkataraman authored 11 years ago

9cc3a6d3

Make collectPartitions take an array of partitions · d3234f97

Shivaram Venkataraman authored 11 years ago

Change the implementation to use runJob instead of PartitionPruningRDD.
Also update the unit tests and the python take implementation
to use the new interface.

d3234f97

Merge pull request #278 from MLnick/java-python-tostring · 440e531a

Matei Zaharia authored 11 years ago

Add toString to Java RDD, and __repr__ to Python RDD

Addresses [SPARK-992](https://spark-project.atlassian.net/browse/SPARK-992)

440e531a

Add toString to Java RDD, and __repr__ to Python RDD · a76f5341
Nick Pentreath authored 11 years ago

a76f5341

Merge pull request #183 from aarondav/spark-959 · d8d3f3e6

Reynold Xin authored 11 years ago

[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar

Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See [bug](https://spark-project.atlassian.net/browse/SPARK-959) for more details.

Note that this may not be the best solution, as I do not understand the root cause of why this only happens for some people. However, it is reported to work.

d8d3f3e6

Minor changes. · ec71b445
Tathagata Das authored 11 years ago

ec71b445

[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar · eaf6a269

Aaron Davidson authored 11 years ago

Without this, in some cases, Ivy attempts to download the wrong file
and fails, stopping the whole build. See bug for more details.

(This is probably also the beginning of the slow death of our
recently prettified dependencies. Form follow function.)

eaf6a269

Merge pull request #247 from aarondav/minor · bfba5323

Reynold Xin authored 11 years ago

Increase spark.akka.askTimeout default to 30 seconds

In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase.
There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation.

Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe).

Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.

bfba5323