Skip to content
Snippets Groups Projects
  • Tathagata Das's avatar
    90b11439
    [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming · 90b11439
    Tathagata Das authored
    ## What changes were proposed in this pull request?
    Currently structured streaming only supports append output mode.  This PR adds the following.
    
    - Added support for Complete output mode in the internal state store, analyzer and planner.
    - Added public API in Scala and Python for users to specify output mode
    - Added checks for unsupported combinations of output mode and DF operations
      - Plans with no aggregation should support only Append mode
      - Plans with aggregation should support only Update and Complete modes
      - Default output mode is Append mode (**Question: should we change this to automatically set to Complete mode when there is aggregation?**)
    - Added support for Complete output mode in Memory Sink. So Memory Sink internally supports append and complete, update. But from public API only Complete and Append output modes are supported.
    
    ## How was this patch tested?
    Unit tests in various test suites
    - StreamingAggregationSuite: tests for complete mode
    - MemorySinkSuite: tests for checking behavior in Append and Complete modes.
    - UnsupportedOperationSuite: tests for checking unsupported combinations of DF ops and output modes
    - DataFrameReaderWriterSuite: tests for checking that output mode cannot be called on static DFs
    - Python doc test and existing unit tests modified to call write.outputMode.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #13286 from tdas/complete-mode.
    90b11439
    History
    [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming
    Tathagata Das authored
    ## What changes were proposed in this pull request?
    Currently structured streaming only supports append output mode.  This PR adds the following.
    
    - Added support for Complete output mode in the internal state store, analyzer and planner.
    - Added public API in Scala and Python for users to specify output mode
    - Added checks for unsupported combinations of output mode and DF operations
      - Plans with no aggregation should support only Append mode
      - Plans with aggregation should support only Update and Complete modes
      - Default output mode is Append mode (**Question: should we change this to automatically set to Complete mode when there is aggregation?**)
    - Added support for Complete output mode in Memory Sink. So Memory Sink internally supports append and complete, update. But from public API only Complete and Append output modes are supported.
    
    ## How was this patch tested?
    Unit tests in various test suites
    - StreamingAggregationSuite: tests for complete mode
    - MemorySinkSuite: tests for checking behavior in Append and Complete modes.
    - UnsupportedOperationSuite: tests for checking unsupported combinations of DF ops and output modes
    - DataFrameReaderWriterSuite: tests for checking that output mode cannot be called on static DFs
    - Python doc test and existing unit tests modified to call write.outputMode.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #13286 from tdas/complete-mode.