Commits · 55b7e2fdffc6c3537da69152a3d02d5be599fa1b · cs525-sp18-g07 / spark

Dec 31, 2013

Merge pull request #289 from tdas/filestream-fix · 55b7e2fd

Patrick Wendell authored 11 years ago

Bug fixes for file input stream and checkpointing

- Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.)
- Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration.
- Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten.
- Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.

55b7e2fd

Fixed comments and long lines based on comments on PR 289. · fcd17a1e
Tathagata Das authored 11 years ago

fcd17a1e

Dec 30, 2013

Merge pull request #308 from kayousterhout/stage_naming · 50e3b8ec

Patrick Wendell authored 11 years ago

Changed naming of StageCompleted event to be consistent

The rest of the SparkListener events are named with "SparkListener"
as the prefix of the name; this commit renames the StageCompleted
event to SparkListenerStageCompleted for consistency.

50e3b8ec

Dec 29, 2013
- Updated code style according to Patrick's comments · c2c1af39
  Kay Ousterhout authored 11 years ago
  
  c2c1af39
- Revert "Merge pull request #310 from jyunfan/master" · 72a17b69
  Reynold Xin authored 11 years ago
  
  This reverts commit 79b20e4d, reversing changes made to 7375047d.
  72a17b69
- Merge pull request #310 from jyunfan/master · 79b20e4d
  Reynold Xin authored 11 years ago
  
  Fix typo in the Accumulators section Change 'val' to 'var'
  79b20e4d
Dec 28, 2013
- Fix typo in the Accumulators section · 17f6620a
  Jyun-Fan Tsai authored 11 years ago
  
  val => var
  17f6620a
- Merge pull request #304 from kayousterhout/remove_unused · 7375047d
  Patrick Wendell authored 11 years ago
  
  Removed unused failed and causeOfFailure variables (in TaskSetManager)
  7375047d
Dec 27, 2013
- Merge pull request #307 from kayousterhout/other_failure · ad3dfd15
  Matei Zaharia authored 11 years ago
  
  Removed unused OtherFailure TaskEndReason. The OtherFailure TaskEndReason was added by @mateiz 3 years ago in this commit: https://github.com/apache/incubator-spark/commit/24a1e7f8380bfd8d4fbdda688482a451bd6ea215 Unless I am missing something, it doesn't seem to have been used then, and is not used now, so seems safe for deletion.
  ad3dfd15
- Merge pull request #306 from kayousterhout/remove_pending · b579b832
  Matei Zaharia authored 11 years ago
  
  Remove unused hasPendingTasks methods
  b579b832
- Changed naming of StageCompleted event to be consistent · b4619e50
  Kay Ousterhout authored 11 years ago
  
  The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
  b4619e50
- Removed unused OtherFailure TaskEndReason. · e17d7518
  Kay Ousterhout authored 11 years ago
  
  e17d7518
- Remove unused hasPendingTasks methods · 8419148e
  Kay Ousterhout authored 11 years ago
  
  8419148e
- Merge pull request #305 from kayousterhout/line_spacing · 19672dca
  Patrick Wendell authored 11 years ago
  
  Fixed >100char lines in DAGScheduler.scala There's no changed functionality here -- only line spacing and one grammatical fix in a comment.
  19672dca
- Minor changes in comments and strings to address comments in PR 289. · 271e3237
  Tathagata Das authored 11 years ago
  
  271e3237
- Style fixes as per Reynold's review · 0c71ffe9
  Kay Ousterhout authored 11 years ago
  
  0c71ffe9
- Fixed >100char lines in DAGScheduler.scala · 8c81068e
  Kay Ousterhout authored 11 years ago
  
  8c81068e
- Removed unused failed and causeOfFailure variables · baaabced
  Kay Ousterhout authored 11 years ago
  
  baaabced
- Merge pull request #298 from aarondav/minor · 7be1e577
  Reynold Xin authored 11 years ago
  
  Minor: Decrease margin of left side of Log page Before ![before](https://f.cloud.github.com/assets/1400247/1812647/1a4be53e-6e87-11e3-9d5b-f851274be0e9.png) After ![after](https://f.cloud.github.com/assets/1400247/1812648/1ca1ea2c-6e87-11e3-946c-31be9258f450.png) It's a start anyway...
  7be1e577
- Merge pull request #302 from pwendell/SPARK-1007 · 7d811ba6
  Reynold Xin authored 11 years ago
  
  SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10 Reported by Qiuzhuang Lian
  7d811ba6
- SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10 · 0cc1e0d4
  Patrick Wendell authored 11 years ago
  
  0cc1e0d4
Dec 26, 2013

Merge pull request #295 from markhamstra/JobProgressListenerNPE · 5e69fc5b
Matei Zaharia authored 11 years ago
```
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
```
5e69fc5b
Decrease margin of left side of log page · 4f2fb761
Aaron Davidson authored 11 years ago

4f2fb761
Added warning if filestream adds files with no data in them (file RDDs have 0 partitions). · 3618d70b
Tathagata Das authored 11 years ago

3618d70b
Changed file stream to not catch any exceptions related to finding new files... · be647191
Tathagata Das authored 11 years ago
```
Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).
```
be647191
Merge pull request #296 from witgo/master · e240bad0
Matei Zaharia authored 11 years ago
```
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn package
```
e240bad0
Removed slack time in file stream and added better handling of exceptions due... · bacc65cf
Tathagata Das authored 11 years ago
```
Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.
```
bacc65cf
fix this import order · b662c88a
liguoqiang authored 11 years ago

b662c88a
Avoid a lump of coal (NPE) in JobProgressListener's stocking. · c529dcea
Mark Hamstra authored 11 years ago

c529dcea

Merge pull request #283 from tmyklebu/master · c344ed04

Matei Zaharia authored 11 years ago

Python bindings for mllib

This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib.

For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model.

ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method.

I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.

c344ed04

Dec 25, 2013
- Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn · 2bd76f69
  liguoqiang authored 11 years ago
  
  2bd76f69
- Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn · 14fcef72
  liguoqiang authored 11 years ago
  
  14fcef72
- Remove commented code in __init__.py. · 9cbcf814
  Tor Myklebust authored 11 years ago
  
  9cbcf814
- Fix copypasta in __init__.py. Don't import anything directly into pyspark.mllib. · 5e71354c
  Tor Myklebust authored 11 years ago
  
  5e71354c
- Merge pull request #290 from ash211/patch-3 · 56094bcd
  Matei Zaharia authored 11 years ago
  
  Typo: avaiable -> available
  56094bcd
- Merge pull request #287 from azuryyu/master · 4842a07d
  Reynold Xin authored 11 years ago
  
  Fixed job name in the java streaming example.
  4842a07d
Dec 24, 2013
- Initial weights in Scala are ones; do that too. Also fix some errors. · 02208a17
  Tor Myklebust authored 11 years ago
  
  02208a17
- Scala stubs for updated Python bindings. · 4e821390
  Tor Myklebust authored 11 years ago
  
  4e821390
- Split the mllib bindings into a whole bunch of modules and rename some things. · 05163057
  Tor Myklebust authored 11 years ago
  
  05163057
- Typo: avaiable -> available · 3665c722
  Andrew Ash authored 11 years ago
  
  3665c722