Commits · ffa1d38ef19a7d5c5c2fc173d1d2f54267449f80 · cs525-sp18-g07 / spark

Jan 13, 2014
- Fixed import formatting. · ffa1d38e
  Tathagata Das authored 11 years ago
  
  ffa1d38e
Jan 12, 2014

Merge remote-tracking branch 'apache/master' into dstream-move · 777c181d
Tathagata Das authored 11 years ago
```
Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
```
777c181d

Merge pull request #394 from tdas/error-handling · 405bfe86

Patrick Wendell authored 11 years ago

Better error handling in Spark Streaming and more API cleanup

Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug.

This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext.

Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392.

And changed a lot of protected[streaming] to private[streaming].

405bfe86

Merge pull request #398 from pwendell/streaming-api · 28a6b0cd

Patrick Wendell authored 11 years ago

Rename DStream.foreach to DStream.foreachRDD

`foreachRDD` makes it clear that the granularity of this operator is per-RDD.
As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other
DStream operators which get pushed down to individual records within each RDD.

28a6b0cd

Fixed persistence logic of WindowedDStream, and fixed default persistence level of input streams. · 034f89aa
Tathagata Das authored 11 years ago

034f89aa
Adding deprecated versions of old code · e6e20cee
Patrick Wendell authored 11 years ago

e6e20cee
Merge remote-tracking branch 'apache/master' into dstream-move · 74d01262
Tathagata Das authored 11 years ago

74d01262
Merge remote-tracking branch 'apache/master' into error-handling · aa2c9938
Tathagata Das authored 11 years ago

aa2c9938
Merge branch 'error-handling' into dstream-move · d1820fef
Tathagata Das authored 11 years ago

d1820fef
Changed StreamingContext.stopForWait to awaitTermination. · c7fabb74
Tathagata Das authored 11 years ago

c7fabb74

Rename DStream.foreach to DStream.foreachRDD · f4d77f8c

Patrick Wendell authored 11 years ago

`foreachRDD` makes it clear that the granularity of this operator is per-RDD.
As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other
DStream operators which get pushed down to individual records within each RDD.

f4d77f8c

Merge pull request #396 from pwendell/executor-env · 074f5023

Patrick Wendell authored 11 years ago

Setting load defaults to true in executor

This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations.

/cc @andrewor14 who initially reported this issue.

074f5023

Merge pull request #392 from rxin/listenerbus · 82e2b92c

Reynold Xin authored 11 years ago

Stop SparkListenerBus daemon thread when DAGScheduler is stopped.

Otherwise this leads to hundreds of SparkListenerBus daemon threads in our unit tests (and also problematic if user applications launches multiple SparkContext).

82e2b92c

Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and... · 7883b8f5
Tathagata Das authored 11 years ago
```
Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and NetworkInputTracker upon close.
```
7883b8f5
Setting load defaults to true in executor · cfb1e6c1
Patrick Wendell authored 11 years ago

cfb1e6c1

Moved DStream, DStreamCheckpointData and PairDStream from... · 448aef67

Tathagata Das authored 11 years ago

Moved DStream, DStreamCheckpointData and PairDStream from org.apache.spark.streaming to org.apache.spark.streaming.dstream.

448aef67

Fixed bugs. · c5921e5c
Tathagata Das authored 11 years ago

c5921e5c
Merge remote-tracking branch 'apache/master' into error-handling · 18f4889d
Tathagata Das authored 11 years ago

18f4889d
Added waitForStop and stop to JavaStreamingContext. · 4d9b0ab4
Tathagata Das authored 11 years ago

4d9b0ab4

Converted JobScheduler to use actors for event handling. Changed... · f5108ffc

Tathagata Das authored 11 years ago

Converted JobScheduler to use actors for event handling. Changed protected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite.

f5108ffc

Jan 11, 2014

Merge pull request #389 from rxin/clone-writables · 288a8789
Reynold Xin authored 11 years ago
```
Minor update for clone writables and more documentation.
```
288a8789

Merge pull request #388 from pwendell/master · dbc11df4

Reynold Xin authored 11 years ago

Fix UI bug introduced in #244.

The 'duration' field was incorrectly renamed to 'task time' in the table that
lists stages.

dbc11df4

Renamed cloneKeyValues to cloneRecords; updated docs. · 362cda18
Reynold Xin authored 11 years ago

362cda18

Merge pull request #393 from pwendell/revert-381 · 409866b3

Patrick Wendell authored 11 years ago

Revert PR 381

This PR missed a bunch of test cases that require "spark.cleaner.ttl". I think it is what is causing test failures on Jenkins right now (though it's a bit hard to tell because the DNS for cs.berkeley.edu is down).

I'm submitting this to see if it fixes jeknins. I did try just patching various tests but it was taking a really long time because there are a bunch of them, so for now I'm just seeing if a revert works.

409866b3

Revert "Fix default TTL for metadata cleaner" · 07b952e1
Patrick Wendell authored 11 years ago
```
This reverts commit 669ba4ca.
```
07b952e1
Revert "Fix one unit test that was not setting spark.cleaner.ttl" · 22d4d624
Patrick Wendell authored 11 years ago
```
This reverts commit 942c80b3.
```
22d4d624
Stop SparkListenerBus daemon thread when DAGScheduler is stopped. · 2180c871
Reynold Xin authored 11 years ago

2180c871
Merge pull request #387 from jerryshao/conf-fix · 6510f04e
Reynold Xin authored 11 years ago
```
Fix configure didn't work small problem in ALS
```
6510f04e
Minor update for clone writables and more documentation. · b0fbfcca
Reynold Xin authored 11 years ago

b0fbfcca

Merge pull request #359 from ScrapCodes/clone-writables · ee6e7f9b

Reynold Xin authored 11 years ago

We clone hadoop key and values by default and reuse objects if asked to.

We try to clone for most common types of writables and we call WritableUtils.clone otherwise intention is to optimize, for example for NullWritable there is no need and for Long, int and String creating a new object with value set would be faster than doing copy on object hopefully.

There is another way to do this PR where we ask for both key and values whether to clone them or not, but could not think of a use case for it except either of them is actually a NullWritable for which I have already worked around. So thought that would be unnecessary.

ee6e7f9b

Fix UI bug introduced in #244. · b313e156

Patrick Wendell authored 11 years ago

The 'duration' field was incorrectly renamed to 'task time' in the table that
lists stages.

b313e156

Merge pull request #373 from jerryshao/kafka-upgrade · 4216178d
Patrick Wendell authored 11 years ago
```
Upgrade Kafka dependecy to 0.8.0 release version
```
4216178d
Fix configure didn't work small problem in ALS · cbfbc019
jerryshao authored 11 years ago

cbfbc019

Merge pull request #376 from prabeesh/master · 92ad18b0

Reynold Xin authored 11 years ago

Change clientId to random clientId

The client identifier should be unique across all clients connecting to the same server. A convenience method is provided to generate a random client id that should satisfy this criteria - generateClientId(). Returns a randomly generated client identifier based on the current user's login name and the system time. As the client identifier is used by the server to identify a client when it reconnects, the client must use the same identifier between connections if durable subscriptions are to be used.

92ad18b0

Merge pull request #386 from pwendell/typo-fix · 0b5ce7af
Reynold Xin authored 11 years ago
```
Small typo fix
```
0b5ce7af

Jan 10, 2014
- Merge pull request #381 from mateiz/default-ttl · 1d7bef0c
  Matei Zaharia authored 11 years ago
  
  Fix default TTL for metadata cleaner It seems to have been set to 3500 in a previous commit for debugging, but it should be off by default.
  1d7bef0c
- Merge pull request #382 from RongGu/master · 44d6a8e3
  Patrick Wendell authored 11 years ago
  
  Fix a type error in comment lines Fix a type error in comment lines
  44d6a8e3
- Small typo fix · 08370a52
  Patrick Wendell authored 11 years ago
  
  08370a52
- Merge pull request #385 from shivaram/add-i2-instances · 88faa30a
  Patrick Wendell authored 11 years ago
  
  Add i2 instance types to Spark EC2. Using data from http://aws.amazon.com/amazon-linux-ami/instance-type-matrix/ and http://www.ec2instances.info/
  88faa30a
- Fix one unit test that was not setting spark.cleaner.ttl · 942c80b3
  Matei Zaharia authored 11 years ago
  
  942c80b3