-
- Downloads
[SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource
## What changes were proposed in this pull request? Today, we compare the whole path when deciding if a file is new in the FileSource for structured streaming. However, this would cause false negatives in the case where the path has changed in a cosmetic way (i.e. changing `s3n` to `s3a`). This patch adds an option `fileNameOnly` that causes the new file check to be based only on the filename (but still store the whole path in the log). ## Usage ```scala spark .readStream .option("fileNameOnly", true) .text("s3n://bucket/dir1/dir2") .writeStream ... ``` ## How was this patch tested? Added a test case Author: Liwei Lin <lwlin7@gmail.com> Closes #17120 from lw-lin/filename-only.
Showing
- docs/structured-streaming-programming-guide.md 10 additions, 2 deletionsdocs/structured-streaming-programming-guide.md
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamOptions.scala 25 additions, 9 deletions...che/spark/sql/execution/streaming/FileStreamOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala 17 additions, 8 deletions...ache/spark/sql/execution/streaming/FileStreamSource.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 20 additions, 2 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
Loading
Please register or sign in to comment