-
- Downloads
[SPARK-20140][DSTREAM] Remove hardcoded kinesis retry wait and max retries
## What changes were proposed in this pull request? The pull requests proposes to remove the hardcoded values for Amazon Kinesis - MIN_RETRY_WAIT_TIME_MS, MAX_RETRIES. This change is critical for kinesis checkpoint recovery when the kinesis backed rdd is huge. Following happens in a typical kinesis recovery : - kinesis throttles large number of requests while recovering - retries in case of throttling are not able to recover due to the small wait period - kinesis throttles per second, the wait period should be configurable for recovery The patch picks the spark kinesis configs from: - spark.streaming.kinesis.retry.wait.time - spark.streaming.kinesis.retry.max.attempts Jira : https://issues.apache.org/jira/browse/SPARK-20140 ## How was this patch tested? Modified the KinesisBackedBlockRDDSuite.scala to run kinesis tests with the modified configurations. Wasn't able to test the patch with actual throttling. Author: Yash Sharma <ysharma@atlassian.com> Closes #17467 from yssharma/ysharma/spark-kinesis-retries.
Showing
- external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala 13 additions, 20 deletions...pache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
- external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala 4 additions, 2 deletions.../apache/spark/streaming/kinesis/KinesisInputDStream.scala
- external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReadConfigurations.scala 78 additions, 0 deletions...e/spark/streaming/kinesis/KinesisReadConfigurations.scala
- external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala 48 additions, 1 deletion...g/apache/spark/streaming/kinesis/KinesisStreamSuite.scala
Please register or sign in to comment