external · 62236b9eb951f171d96e9d7f5f12d641a2da9a26 · cs525-sp18-g07 / spark

[SPARK-17829][SQL] Stable format for offset log

Tyson Condie authored 8 years ago

## What changes were proposed in this pull request?

Currently we use java serialization for the WAL that stores the offsets contained in each batch. This has two main issues:
It can break across spark releases (though this is not the only thing preventing us from upgrading a running query)
It is unnecessarily opaque to the user.
I'd propose we require offsets to provide a user readable serialization and use that instead. JSON is probably a good option.
## How was this patch tested?

Tests were added for KafkaSourceOffset in [KafkaSourceOffsetSuite](external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceOffsetSuite.scala) and for LongOffset in [OffsetSuite](sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

 before opening a pull request.

zsxwing marmbrus

Author: Tyson Condie <tcondie@gmail.com>
Author: Tyson Condie <tcondie@clash.local>

Closes #15626 from tcondie/spark-8360.

(cherry picked from commit 3f62e1b5)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

b7d29256

History

b7d29256 8 years ago

History

Name	Last commit	Last update
..
docker-integration-tests
docker
flume-assembly
flume-sink
flume
java8-tests
kafka-0-10-assembly
kafka-0-10-sql
kafka-0-10
kafka-0-8-assembly
kafka-0-8
kinesis-asl-assembly
kinesis-asl
spark-ganglia-lgpl