Skip to content
Snippets Groups Projects
Commit 5912c19e authored by Tyson Condie's avatar Tyson Condie Committed by Shixiong Zhu
Browse files

[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval"...

[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval" direcly with user setting.

## What changes were proposed in this pull request?
CompactibleFileStreamLog relys on "compactInterval" to detect a compaction batch. If the "compactInterval" is reset by user, CompactibleFileStreamLog will return wrong answer, resulting data loss. This PR procides a way to check the validity of 'compactInterval', and calculate an appropriate value.

## How was this patch tested?
When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one.

The primary solution to this issue was given by uncleGen
Added extensions include an additional metadata field in OffsetSeq and CompactibleFileStreamLog APIs. zsxwing

Author: Tyson Condie <tcondie@gmail.com>
Author: genmao.ygm <genmao.ygm@genmaoygmdeMacBook-Air.local>

Closes #15852 from tcondie/spark-18187.

(cherry picked from commit 51baca22)
Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
parent e8b1955e
No related branches found
No related tags found
No related merge requests found
Showing
with 178 additions and 39 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment