- Jan 13, 2014
-
-
Tathagata Das authored
-
- Jan 12, 2014
-
-
Tathagata Das authored
Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
-
Patrick Wendell authored
Better error handling in Spark Streaming and more API cleanup Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug. This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext. Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392. And changed a lot of protected[streaming] to private[streaming].
-
Patrick Wendell authored
Rename DStream.foreach to DStream.foreachRDD `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
-
Tathagata Das authored
-
Patrick Wendell authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Patrick Wendell authored
`foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
-
Patrick Wendell authored
Setting load defaults to true in executor This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations. /cc @andrewor14 who initially reported this issue.
-
Reynold Xin authored
Stop SparkListenerBus daemon thread when DAGScheduler is stopped. Otherwise this leads to hundreds of SparkListenerBus daemon threads in our unit tests (and also problematic if user applications launches multiple SparkContext).
-
Tathagata Das authored
Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and NetworkInputTracker upon close.
-
Patrick Wendell authored
-
Tathagata Das authored
Moved DStream, DStreamCheckpointData and PairDStream from org.apache.spark.streaming to org.apache.spark.streaming.dstream.
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
Converted JobScheduler to use actors for event handling. Changed protected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite.
-
- Jan 11, 2014
-
-
Reynold Xin authored
Minor update for clone writables and more documentation.
-
Reynold Xin authored
Fix UI bug introduced in #244. The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.
-
Reynold Xin authored
-
Patrick Wendell authored
Revert PR 381 This PR missed a bunch of test cases that require "spark.cleaner.ttl". I think it is what is causing test failures on Jenkins right now (though it's a bit hard to tell because the DNS for cs.berkeley.edu is down). I'm submitting this to see if it fixes jeknins. I did try just patching various tests but it was taking a really long time because there are a bunch of them, so for now I'm just seeing if a revert works.
-
Patrick Wendell authored
This reverts commit 669ba4ca.
-
Patrick Wendell authored
This reverts commit 942c80b3.
-
Reynold Xin authored
-
Reynold Xin authored
Fix configure didn't work small problem in ALS
-
Reynold Xin authored
-
Reynold Xin authored
We clone hadoop key and values by default and reuse objects if asked to. We try to clone for most common types of writables and we call WritableUtils.clone otherwise intention is to optimize, for example for NullWritable there is no need and for Long, int and String creating a new object with value set would be faster than doing copy on object hopefully. There is another way to do this PR where we ask for both key and values whether to clone them or not, but could not think of a use case for it except either of them is actually a NullWritable for which I have already worked around. So thought that would be unnecessary.
-
Patrick Wendell authored
The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.
-
Patrick Wendell authored
Upgrade Kafka dependecy to 0.8.0 release version
-
jerryshao authored
-
Reynold Xin authored
Change clientId to random clientId The client identifier should be unique across all clients connecting to the same server. A convenience method is provided to generate a random client id that should satisfy this criteria - generateClientId(). Returns a randomly generated client identifier based on the current user's login name and the system time. As the client identifier is used by the server to identify a client when it reconnects, the client must use the same identifier between connections if durable subscriptions are to be used.
-
Reynold Xin authored
Small typo fix
-
- Jan 10, 2014
-
-
Matei Zaharia authored
Fix default TTL for metadata cleaner It seems to have been set to 3500 in a previous commit for debugging, but it should be off by default.
-
Patrick Wendell authored
Fix a type error in comment lines Fix a type error in comment lines
-
Patrick Wendell authored
-
Patrick Wendell authored
Add i2 instance types to Spark EC2. Using data from http://aws.amazon.com/amazon-linux-ami/instance-type-matrix/ and http://www.ec2instances.info/
-
Matei Zaharia authored
-