-
- Downloads
[SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python
This PR propose a simple way to expose OffsetRange in Python code, also the usage of offsetRanges is similar to Scala/Java way, here in Python we could get OffsetRange like: ``` dstream.foreachRDD(lambda r: KafkaUtils.offsetRanges(r)) ``` Reason I didn't follow the way what SPARK-8389 suggested is that: Python Kafka API has one more step to decode the message compared to Scala/Java, Which makes Python API return a transformed RDD/DStream, not directly wrapped so-called JavaKafkaRDD, so it is hard to backtrack to the original RDD to get the offsetRange. Author: jerryshao <saisai.shao@intel.com> Closes #7185 from jerryshao/SPARK-8389 and squashes the following commits: 4c6d320 [jerryshao] Another way to fix subclass deserialization issue e6a8011 [jerryshao] Address the comments fd13937 [jerryshao] Fix serialization bug 7debf1c [jerryshao] bug fix cff3893 [jerryshao] refactor the code according to the comments 2aabf9e [jerryshao] Style fix 848c708 [jerryshao] Add HasOffsetRanges for Python
Showing
- external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala 13 additions, 0 deletions...n/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
- python/pyspark/streaming/kafka.py 113 additions, 10 deletionspython/pyspark/streaming/kafka.py
- python/pyspark/streaming/tests.py 64 additions, 0 deletionspython/pyspark/streaming/tests.py
- python/pyspark/streaming/util.py 6 additions, 1 deletionpython/pyspark/streaming/util.py
Loading
Please register or sign in to comment