-
- Downloads
[SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog in FileStreamSource
## What changes were proposed in this pull request? Current `metadataLog` in `FileStreamSource` will add a checkpoint file in each batch but do not have the ability to remove/compact, which will lead to large number of small files when running for a long time. So here propose to compact the old logs into one file. This method is quite similar to `FileStreamSinkLog` but simpler. ## How was this patch tested? Unit test added. Author: jerryshao <sshao@hortonworks.com> Closes #13513 from jerryshao/SPARK-15698.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala 245 additions, 0 deletions...rk/sql/execution/streaming/CompactibleFileStreamLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala 2 additions, 1 deletion...apache/spark/sql/execution/streaming/FileStreamSink.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLog.scala 21 additions, 191 deletions...che/spark/sql/execution/streaming/FileStreamSinkLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala 10 additions, 10 deletions...ache/spark/sql/execution/streaming/FileStreamSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala 132 additions, 0 deletions...e/spark/sql/execution/streaming/FileStreamSourceLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLogFileCatalog.scala 2 additions, 1 deletion...park/sql/execution/streaming/MetadataLogFileCatalog.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 22 additions, 1 deletion...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala 19 additions, 16 deletions...park/sql/execution/streaming/FileStreamSinkLogSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 97 additions, 2 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
Loading
Please register or sign in to comment