Skip to content
Snippets Groups Projects
Commit a6aade00 authored by jerryshao's avatar jerryshao Committed by Shixiong Zhu
Browse files

[SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog in FileStreamSource

## What changes were proposed in this pull request?

Current `metadataLog` in `FileStreamSource` will add a checkpoint file in each batch but do not have the ability to remove/compact, which will lead to large number of small files when running for a long time. So here propose to compact the old logs into one file. This method is quite similar to `FileStreamSinkLog` but simpler.

## How was this patch tested?

Unit test added.

Author: jerryshao <sshao@hortonworks.com>

Closes #13513 from jerryshao/SPARK-15698.
parent eb004c66
No related branches found
No related tags found
No related merge requests found
Showing
with 550 additions and 222 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment