-
- Downloads
Merge pull request #383 from tdas/driver-test
API for automatic driver recovery for streaming programs and other bug fixes 1. Added Scala and Java API for automatically loading checkpoint if it exists in the provided checkpoint directory. Scala API: `StreamingContext.getOrCreate(<checkpoint dir>, <function to create new StreamingContext>)` returns a StreamingContext Java API: `JavaStreamingContext.getOrCreate(<checkpoint dir>, <factory obj of type JavaStreamingContextFactory>)`, return a JavaStreamingContext See the RecoverableNetworkWordCount below as an example of how to use it. 2. Refactored streaming.Checkpoint*** code to fix bugs and make the DStream metadata checkpoint writing and reading more robust. Specifically, it fixes and improves the logic behind backing up and writing metadata checkpoint files. Also, it ensure that spark.driver.* and spark.hostPort is cleared from SparkConf before being written to checkpoint. 3. Fixed bug in cleaning up of checkpointed RDDs created by DStream. Specifically, this fix ensures that checkpointed RDD's files are not prematurely cleaned up, thus ensuring reliable recovery. 4. TimeStampedHashMap is upgraded to optionally update the timestamp on map.get(key). This allows clearing of data based on access time (i.e., clear records were last accessed before a threshold timestamp). 5. Added caching for file modification time in FileInputDStream using the updated TimeStampedHashMap. Without the caching, enumerating the mod times to find new files can take seconds if there are 1000s of files. This cache is automatically cleared. This PR is not entirely final as I may make some minor additions - a Java examples, and adding StreamingContext.getOrCreate to unit test. Edit: Java example to be added later, unit test added.
No related branches found
No related tags found
Showing
- core/src/main/scala/org/apache/spark/SparkContext.scala 1 addition, 1 deletioncore/src/main/scala/org/apache/spark/SparkContext.scala
- core/src/main/scala/org/apache/spark/util/TimeStampedHashMap.scala 12 additions, 5 deletions...main/scala/org/apache/spark/util/TimeStampedHashMap.scala
- examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java 3 additions, 4 deletions...apache/spark/streaming/examples/JavaNetworkWordCount.java
- examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala 3 additions, 2 deletions...rg/apache/spark/streaming/examples/NetworkWordCount.scala
- examples/src/main/scala/org/apache/spark/streaming/examples/RecoverableNetworkWordCount.scala 118 additions, 0 deletions...park/streaming/examples/RecoverableNetworkWordCount.scala
- streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 128 additions, 60 deletions...rc/main/scala/org/apache/spark/streaming/Checkpoint.scala
- streaming/src/main/scala/org/apache/spark/streaming/DStream.scala 10 additions, 5 deletions...g/src/main/scala/org/apache/spark/streaming/DStream.scala
- streaming/src/main/scala/org/apache/spark/streaming/DStreamCheckpointData.scala 61 additions, 45 deletions...la/org/apache/spark/streaming/DStreamCheckpointData.scala
- streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala 23 additions, 15 deletions.../main/scala/org/apache/spark/streaming/DStreamGraph.scala
- streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala 60 additions, 15 deletions...n/scala/org/apache/spark/streaming/StreamingContext.scala
- streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala 90 additions, 6 deletions...pache/spark/streaming/api/java/JavaStreamingContext.scala
- streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala 31 additions, 11 deletions...org/apache/spark/streaming/dstream/FileInputDStream.scala
- streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala 21 additions, 10 deletions...a/org/apache/spark/streaming/scheduler/JobGenerator.scala
- streaming/src/main/scala/org/apache/spark/streaming/util/MasterFailureTest.scala 34 additions, 21 deletions...a/org/apache/spark/streaming/util/MasterFailureTest.scala
- streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala 5 additions, 5 deletions...st/scala/org/apache/spark/streaming/CheckpointSuite.scala
Loading
Please register or sign in to comment