-
- Downloads
[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval"...
[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval" direcly with user setting. ## What changes were proposed in this pull request? CompactibleFileStreamLog relys on "compactInterval" to detect a compaction batch. If the "compactInterval" is reset by user, CompactibleFileStreamLog will return wrong answer, resulting data loss. This PR procides a way to check the validity of 'compactInterval', and calculate an appropriate value. ## How was this patch tested? When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one. The primary solution to this issue was given by uncleGen Added extensions include an additional metadata field in OffsetSeq and CompactibleFileStreamLog APIs. zsxwing Author: Tyson Condie <tcondie@gmail.com> Author: genmao.ygm <genmao.ygm@genmaoygmdeMacBook-Air.local> Closes #15852 from tcondie/spark-18187. (cherry picked from commit 51baca22) Signed-off-by:Shixiong Zhu <shixiong@databricks.com>
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala 60 additions, 1 deletion...rk/sql/execution/streaming/CompactibleFileStreamLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLog.scala 5 additions, 3 deletions...che/spark/sql/execution/streaming/FileStreamSinkLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala 5 additions, 4 deletions...e/spark/sql/execution/streaming/FileStreamSourceLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala 1 addition, 1 deletion...pache/spark/sql/execution/streaming/HDFSMetadataLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala 9 additions, 3 deletions.../org/apache/spark/sql/execution/streaming/OffsetSeq.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala 22 additions, 9 deletions...g/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLogSuite.scala 33 additions, 0 deletions...l/execution/streaming/CompactibleFileStreamLogSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 25 additions, 16 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 18 additions, 2 deletions...est/scala/org/apache/spark/sql/streaming/StreamTest.scala
Loading
Please register or sign in to comment