-
- Downloads
[SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes
## What changes were proposed in this pull request? There is a race condition introduced in SPARK-11141 which could cause data loss. The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue. In this PR only the allocated blocks will be removed from the queue which will prevent data loss. ## How was this patch tested? Additional unit test + manually. Author: Gabor Somogyi <gabor.g.somogyi@gmail.com> Closes #20620 from gaborgsomogyi/SPARK-23438.
Showing
- streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala 7 additions, 4 deletions...ache/spark/streaming/scheduler/ReceivedBlockTracker.scala
- streaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockTrackerSuite.scala 22 additions, 1 deletion...rg/apache/spark/streaming/ReceivedBlockTrackerSuite.scala
Please register or sign in to comment