Commits · fed1122d74c9d92ce26c28b0bf429e556dcc9bdd · cs525-sp18-g07 / spark

Feb 19, 2013

Use RDD type for `slice` operator in Java. · fed1122d

Patrick Wendell authored 12 years ago

This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.

fed1122d

Use RDD type for `transform` operator in Java. · 35880de4

Patrick Wendell authored 12 years ago

This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions

1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD

whereas previously only (1) and (3) were possible.

Conflicts:

	streaming/src/test/java/spark/streaming/JavaAPISuite.java

35880de4

Use RDD type for `foreach` operator in Java. · 9d49a6b0
Patrick Wendell authored 12 years ago

9d49a6b0
Merge pull request #476 from tdas/streaming · 8b9c673f
Tathagata Das authored 12 years ago
```
Major modifications to fix driver fault-tolerance with file input stream
```
8b9c673f
Added comment to the KafkaWordCount, given by Sean McNamara. · 7e30c46a
Tathagata Das authored 12 years ago

7e30c46a
Merge branch 'mesos-streaming' into streaming · 7851b34e
Tathagata Das authored 12 years ago

7851b34e

Merge branch 'streaming' into ScrapCodes-streaming-actor · 9e82be15

Tathagata Das authored 12 years ago

Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala

9e82be15

Feb 18, 2013

Changed networkStream to socketStream and pluggableNetworkStream to become... · 12ea14c2

Tathagata Das authored 12 years ago

Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.

12ea14c2

Merge branch 'streaming' into ScrapCode-streaming · 6a6e6bda

Tathagata Das authored 12 years ago

Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala

6a6e6bda

Added checkpointing and fault-tolerance semantics to the programming guide.... · 8ad561dc

Tathagata Das authored 12 years ago

Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.

8ad561dc

Feb 17, 2013

Many changes to ensure better 2nd recovery if 2nd failure happens while · f98c7da2

Tathagata Das authored 12 years ago

recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.

f98c7da2

Feb 15, 2013
- Made MasterFailureTest more robust. · ddcb976b
  Tathagata Das authored 12 years ago
  
  ddcb976b
Feb 14, 2013

Moved Java streaming examples to examples/src/main/java/spark/streaming/...... · 4b8402e9

Tathagata Das authored 12 years ago

Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.

4b8402e9

Added TwitterInputDStream from example to StreamingContext. Renamed example... · def8126d
Tathagata Das authored 12 years ago
```
Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.
```
def8126d

Removed countByKeyAndWindow on paired DStreams, and added... · 2eacf224

Tathagata Das authored 12 years ago

Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.

2eacf224

Feb 13, 2013
- Changes functions comments to make them more consistent. · 03e8dc68
  Tathagata Das authored 12 years ago
  
  03e8dc68
- Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated... · 12b020b6
  Tathagata Das authored 12 years ago
  
  Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters.
  12b020b6
- Changed scheduler and file input stream to fix bugs in the driver fault... · 39addd38
  Tathagata Das authored 12 years ago
  
  Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream.
  39addd38
Feb 10, 2013
- Fixed bugs in FileInputDStream and Scheduler that occasionally failed to... · fd90daf8
  Tathagata Das authored 12 years ago
  
  Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream.
  fd90daf8
- Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits. · 16baea62
  Tathagata Das authored 12 years ago
  
  16baea62
Feb 09, 2013
- Added an initial spark job to ensure worker nodes are initialized. · 99a5fc49
  Tathagata Das authored 12 years ago
  
  99a5fc49
Feb 08, 2013
- Taking FeederActor out as seperate program · 291dd47c
  Prashant Sharma authored 12 years ago
  
  291dd47c
Feb 07, 2013
- Merge pull request #455 from tdas/streaming · bcee3cb2
  Tathagata Das authored 12 years ago
  
  Merging latest master branch changes to the streaming branch
  bcee3cb2
- Merge branch 'mesos-master' into streaming · 4cc223b4
  Tathagata Das authored 12 years ago
  
  4cc223b4
- Updated JavaStreamingContext with updated kafkaStream API. · d55e3aa4
  Tathagata Das authored 12 years ago
  
  d55e3aa4
- Merge branch 'mesos-streaming' into streaming · c6b2f765
  Tathagata Das authored 12 years ago
  
  c6b2f765
- Merge pull request #372 from Reinvigorate/sm-kafka · 12300758
  Tathagata Das authored 12 years ago
  
  Removing offset management code that is non-existent in kafka 0.7.0+
  12300758
- Merge pull request #373 from Reinvigorate/sm-updateStateByKey · 915d9931
  Tathagata Das authored 12 years ago
  
  StateDStream changes to give updateStateByKey consistent behavior
  915d9931
Feb 05, 2013

Merge pull request #450 from stephenh/inlinemergepair · 9cfa0683
Matei Zaharia authored 12 years ago
```
Inline mergePair to look more like the narrow dep branch.
```
9cfa0683
Merge pull request #451 from stephenh/fixdeathpactexception · 03eefbb2
Matei Zaharia authored 12 years ago
```
Handle Terminated to avoid endless DeathPactExceptions.
```
03eefbb2
Merge branch 'master' into fixdeathpactexception · 870b2aaf
Stephen Haberman authored 12 years ago
```
Conflicts:
	core/src/main/scala/spark/deploy/worker/Worker.scala
```
870b2aaf
Merge pull request #449 from stephenh/longerdriversuite · a4611d66
Matei Zaharia authored 12 years ago
```
Increase DriverSuite timeout.
```
a4611d66

Handle Terminated to avoid endless DeathPactExceptions. · 0e19093f

Stephen Haberman authored 12 years ago

Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.

So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.

0e19093f

Increase DriverSuite timeout. · 1ba3393c
Stephen Haberman authored 12 years ago

1ba3393c

Inline mergePair to look more like the narrow dep branch. · 8bd0e888

Stephen Haberman authored 12 years ago

No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.

Also added a comment to explain the usual case of having two parent RDDs.

8bd0e888

Merge pull request #447 from pwendell/streaming-constructor · 2d9eca9f
Matei Zaharia authored 12 years ago
```
Streaming constructor which takes JavaSparkContext
```
2d9eca9f

Streaming constructor which takes JavaSparkContext · 7eea64aa

Patrick Wendell authored 12 years ago

It's sometimes helpful to directly pass a JavaSparkContext,
and take advantage of the various constructors available for that.

7eea64aa

Feb 04, 2013
- Small fix to test for distinct · f6ec547e
  Matei Zaharia authored 12 years ago
  
  f6ec547e
- Fix failing test · aa4ee1e9
  Matei Zaharia authored 12 years ago
  
  aa4ee1e9
Feb 03, 2013
- Merge pull request #445 from JoshRosen/pyspark_fixes · f7b4e428
  Matei Zaharia authored 12 years ago
  
  Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
  f7b4e428