-
- Downloads
[SPARK-19304][STREAMING][KINESIS] fix kinesis slow checkpoint recovery
## What changes were proposed in this pull request? added a limit to getRecords api call call in KinesisBackedBlockRdd. This helps reduce the amount of data returned by kinesis api call making the recovery considerably faster As we are storing the `fromSeqNum` & `toSeqNum` in checkpoint metadata, we can also store the number of records. Which can later be used for api call. ## How was this patch tested? The patch was manually tested Apologies for any silly mistakes, opening first pull request Author: Gaurav <gaurav@techtinium.com> Closes #16842 from Gauravshah/kinesis_checkpoint_recovery_fix_2_1_0.
Showing
- external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala 19 additions, 6 deletions...pache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
- external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala 2 additions, 1 deletion.../org/apache/spark/streaming/kinesis/KinesisReceiver.scala
- external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala 2 additions, 2 deletions.../spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala
- external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala 2 additions, 2 deletions...g/apache/spark/streaming/kinesis/KinesisStreamSuite.scala
Please register or sign in to comment