Skip to content
  • Liwei Lin's avatar
    40da4d18
    [SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource · 40da4d18
    Liwei Lin authored
    ## What changes were proposed in this pull request?
    
    Today, we compare the whole path when deciding if a file is new in the FileSource for structured streaming. However, this would cause false negatives in the case where the path has changed in a cosmetic way (i.e. changing `s3n` to `s3a`).
    
    This patch adds an option `fileNameOnly` that causes the new file check to be based only on the filename (but still store the whole path in the log).
    
    ## Usage
    
    ```scala
    spark
      .readStream
      .option("fileNameOnly", true)
      .text("s3n://bucket/dir1/dir2")
      .writeStream
      ...
    ```
    ## How was this patch tested?
    
    Added a test case
    
    Author: Liwei Lin <lwlin7@gmail.com>
    
    Closes #17120 from lw-lin/filename-only.
    40da4d18
    [SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource
    Liwei Lin authored
    ## What changes were proposed in this pull request?
    
    Today, we compare the whole path when deciding if a file is new in the FileSource for structured streaming. However, this would cause false negatives in the case where the path has changed in a cosmetic way (i.e. changing `s3n` to `s3a`).
    
    This patch adds an option `fileNameOnly` that causes the new file check to be based only on the filename (but still store the whole path in the log).
    
    ## Usage
    
    ```scala
    spark
      .readStream
      .option("fileNameOnly", true)
      .text("s3n://bucket/dir1/dir2")
      .writeStream
      ...
    ```
    ## How was this patch tested?
    
    Added a test case
    
    Author: Liwei Lin <lwlin7@gmail.com>
    
    Closes #17120 from lw-lin/filename-only.
Loading