Commits · b4ceed40d6e511a1d475b3f4fbcdd2ad24c02b5a · cs525-sp18-g07 / spark

Dec 29, 2013

Merge remote-tracking branch 'origin/master' into conf2 · b4ceed40

Matei Zaharia authored 11 years ago

Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala

b4ceed40

Add Python docs about SparkConf · 58c6fa20
Matei Zaharia authored 11 years ago

58c6fa20

Fix some other Python tests due to initializing JVM in a different way · 615fb649

Matei Zaharia authored 11 years ago

The test in context.py created two different instances of the
SparkContext class by copying "globals", so that some tests can have a
global "sc" object and others can try initializing their own contexts.
This led to two JVM gateways being created since SparkConf also looked
at pyspark.context.SparkContext to get the JVM.

615fb649

Add SparkConf support in Python · cd00225d
Matei Zaharia authored 11 years ago

cd00225d

Dec 28, 2013
- Fix Python use of getLocalDir · 1c11f54a
  Matei Zaharia authored 11 years ago
  
  1c11f54a
- Fix other failing tests · 20631348
  Matei Zaharia authored 11 years ago
  
  20631348
- Add a StreamingContext constructor that takes a conf object · 0900d5c7
  Matei Zaharia authored 11 years ago
  
  0900d5c7
- Fix CheckpointSuite test failures · a8f31638
  Matei Zaharia authored 11 years ago
  
  a8f31638
- Fix test failures due to setting / clearing clock type in Streaming · 578bd1fc
  Matei Zaharia authored 11 years ago
  
  578bd1fc
- Fix Executor not getting properties in local mode · 5bbe7386
  Matei Zaharia authored 11 years ago
  
  5bbe7386
- Check for SPARK_YARN_MODE through a system property too since it can · a16c52ed
  Matei Zaharia authored 11 years ago
  
  sometimes be set that way (undoes a change in previous commit)
  a16c52ed
- Various fixes to configuration code · 642029e7
  Matei Zaharia authored 11 years ago
  
  - Got rid of global SparkContext.globalConf - Pass SparkConf to serializers and compression codecs - Made SparkConf public instead of private[spark] - Improved API of SparkContext and SparkConf - Switched executor environment vars to be passed through SparkConf - Fixed some places that were still using system properties - Fixed some tests, though others are still failing This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation).
  642029e7
Dec 27, 2013
- Merge pull request #307 from kayousterhout/other_failure · ad3dfd15
  Matei Zaharia authored 11 years ago
  
  Removed unused OtherFailure TaskEndReason. The OtherFailure TaskEndReason was added by @mateiz 3 years ago in this commit: https://github.com/apache/incubator-spark/commit/24a1e7f8380bfd8d4fbdda688482a451bd6ea215 Unless I am missing something, it doesn't seem to have been used then, and is not used now, so seems safe for deletion.
  ad3dfd15
- Merge pull request #306 from kayousterhout/remove_pending · b579b832
  Matei Zaharia authored 11 years ago
  
  Remove unused hasPendingTasks methods
  b579b832
- Removed unused OtherFailure TaskEndReason. · e17d7518
  Kay Ousterhout authored 11 years ago
  
  e17d7518
- Remove unused hasPendingTasks methods · 8419148e
  Kay Ousterhout authored 11 years ago
  
  8419148e
- Merge pull request #305 from kayousterhout/line_spacing · 19672dca
  Patrick Wendell authored 11 years ago
  
  Fixed >100char lines in DAGScheduler.scala There's no changed functionality here -- only line spacing and one grammatical fix in a comment.
  19672dca
- Style fixes as per Reynold's review · 0c71ffe9
  Kay Ousterhout authored 11 years ago
  
  0c71ffe9
- Fixed >100char lines in DAGScheduler.scala · 8c81068e
  Kay Ousterhout authored 11 years ago
  
  8c81068e
- Merge pull request #298 from aarondav/minor · 7be1e577
  Reynold Xin authored 11 years ago
  
  Minor: Decrease margin of left side of Log page Before ![before](https://f.cloud.github.com/assets/1400247/1812647/1a4be53e-6e87-11e3-9d5b-f851274be0e9.png) After ![after](https://f.cloud.github.com/assets/1400247/1812648/1ca1ea2c-6e87-11e3-946c-31be9258f450.png) It's a start anyway...
  7be1e577
- Merge pull request #302 from pwendell/SPARK-1007 · 7d811ba6
  Reynold Xin authored 11 years ago
  
  SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10 Reported by Qiuzhuang Lian
  7d811ba6
- SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10 · 0cc1e0d4
  Patrick Wendell authored 11 years ago
  
  0cc1e0d4
Dec 26, 2013

Merge pull request #295 from markhamstra/JobProgressListenerNPE · 5e69fc5b
Matei Zaharia authored 11 years ago
```
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
```
5e69fc5b
Decrease margin of left side of log page · 4f2fb761
Aaron Davidson authored 11 years ago

4f2fb761
Merge pull request #296 from witgo/master · e240bad0
Matei Zaharia authored 11 years ago
```
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn package
```
e240bad0
fix this import order · b662c88a
liguoqiang authored 11 years ago

b662c88a
Avoid a lump of coal (NPE) in JobProgressListener's stocking. · c529dcea
Mark Hamstra authored 11 years ago

c529dcea

Merge pull request #283 from tmyklebu/master · c344ed04

Matei Zaharia authored 11 years ago

Python bindings for mllib

This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib.

For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model.

ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method.

I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.

c344ed04

Dec 25, 2013
- Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn · 2bd76f69
  liguoqiang authored 11 years ago
  
  2bd76f69
- Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn · 14fcef72
  liguoqiang authored 11 years ago
  
  14fcef72
- Remove commented code in __init__.py. · 9cbcf814
  Tor Myklebust authored 11 years ago
  
  9cbcf814
- Fix copypasta in __init__.py. Don't import anything directly into pyspark.mllib. · 5e71354c
  Tor Myklebust authored 11 years ago
  
  5e71354c
- Merge pull request #290 from ash211/patch-3 · 56094bcd
  Matei Zaharia authored 11 years ago
  
  Typo: avaiable -> available
  56094bcd
- Merge pull request #287 from azuryyu/master · 4842a07d
  Reynold Xin authored 11 years ago
  
  Fixed job name in the java streaming example.
  4842a07d
Dec 24, 2013

Initial weights in Scala are ones; do that too. Also fix some errors. · 02208a17
Tor Myklebust authored 11 years ago

02208a17
Scala stubs for updated Python bindings. · 4e821390
Tor Myklebust authored 11 years ago

4e821390
Split the mllib bindings into a whole bunch of modules and rename some things. · 05163057
Tor Myklebust authored 11 years ago

05163057
Typo: avaiable -> available · 3665c722
Andrew Ash authored 11 years ago

3665c722

Merge pull request #127 from kayousterhout/consolidate_schedulers · 85a344b4

Patrick Wendell authored 11 years ago

Deduplicate Local and Cluster schedulers.

The code in LocalScheduler/LocalTaskSetManager was nearly identical
to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy
made making updating the schedulers unnecessarily painful and error-
prone. This commit combines the two into a single TaskScheduler/
TaskSetManager.

Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with
TaskSetManager.scala.

Thanks @rxin for suggesting this change!

85a344b4

Merge pull request #279 from aarondav/shuffle-cleanup0 · c2dd6bcd

Patrick Wendell authored 11 years ago

Clean up shuffle files once their metadata is gone

Previously, we would only clean the in-memory metadata for consolidated shuffle files.

Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.

c2dd6bcd