- Feb 19, 2013
-
-
Patrick Wendell authored
This commit uses the RDD type in `slice`, making it available to both normal and pair RDD's in java. It also updates the signature for `slice` to match changes in the Scala API.
-
Patrick Wendell authored
This is an improved implementation of the `transform` operator in Java. The main difference is that this allows all four possible types of transform functions 1. JavaRDD -> JavaRDD 2. JavaRDD -> JavaPairRDD 3. JavaPairRDD -> JavaPairRDD 4. JavaPairRDD -> JavaRDD whereas previously only (1) and (3) were possible. Conflicts: streaming/src/test/java/spark/streaming/JavaAPISuite.java
-
Patrick Wendell authored
-
Tathagata Das authored
Major modifications to fix driver fault-tolerance with file input stream
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
Conflicts: docs/plugin-custom-receiver.md streaming/src/main/scala/spark/streaming/StreamingContext.scala streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
-
- Feb 18, 2013
-
-
Tathagata Das authored
Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.
-
Tathagata Das authored
Conflicts: streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
-
Tathagata Das authored
Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.
-
- Feb 17, 2013
-
-
Tathagata Das authored
recovering from 1st failure - Made the scheduler to checkpoint after clearing old metadata which ensures that a new checkpoint is written as soon as at least one batch gets computed while recovering from a failure. This ensures that if there is a 2nd failure while recovering from 1st failure, the system start 2nd recovery from a newer checkpoint. - Modified Checkpoint writer to write checkpoint in a different thread. - Added a check to make sure that compute for InputDStreams gets called only for strictly increasing times. - Changed implementation of slice to call getOrCompute on parent DStream in time-increasing order. - Added testcase to test slice. - Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify results with expected output in an order-independent manner.
-
- Feb 15, 2013
-
-
Tathagata Das authored
-
- Feb 14, 2013
-
-
Tathagata Das authored
Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.
-
Tathagata Das authored
Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.
-
Tathagata Das authored
Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.
-
- Feb 13, 2013
-
-
Tathagata Das authored
-
Tathagata Das authored
Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters.
-
Tathagata Das authored
Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream.
-
- Feb 10, 2013
-
-
Tathagata Das authored
Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream.
-
Tathagata Das authored
-
- Feb 09, 2013
-
-
Tathagata Das authored
-
- Feb 08, 2013
-
-
Prashant Sharma authored
-
- Feb 07, 2013
-
-
Tathagata Das authored
Merging latest master branch changes to the streaming branch
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
Removing offset management code that is non-existent in kafka 0.7.0+
-
Tathagata Das authored
StateDStream changes to give updateStateByKey consistent behavior
-
- Feb 05, 2013
-
-
Matei Zaharia authored
Inline mergePair to look more like the narrow dep branch.
-
Matei Zaharia authored
Handle Terminated to avoid endless DeathPactExceptions.
-
Stephen Haberman authored
Conflicts: core/src/main/scala/spark/deploy/worker/Worker.scala
-
Matei Zaharia authored
Increase DriverSuite timeout.
-
Stephen Haberman authored
Credit to Roland Kuhn, Akka's tech lead, for pointing out this various obvious fix, but StandaloneExecutorBackend.preStart's catch block would never (ever) get hit, because all of the operation's in preStart are async. So, the System.exit in the catch block was skipped, and instead Akka was sending Terminated messages which, since we didn't handle, it turned into DeathPactException, which started a postRestart/preStart infinite loop.
-
Stephen Haberman authored
-
Stephen Haberman authored
No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.
-
Matei Zaharia authored
Streaming constructor which takes JavaSparkContext
-
Patrick Wendell authored
It's sometimes helpful to directly pass a JavaSparkContext, and take advantage of the various constructors available for that.
-
- Feb 04, 2013
-
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Feb 03, 2013
-
-
Matei Zaharia authored
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
-