Skip to content
  • Tathagata Das's avatar
    c3d08e2f
    [SPARK-18516][SQL] Split state and progress in streaming · c3d08e2f
    Tathagata Das authored
    This PR separates the status of a `StreamingQuery` into two separate APIs:
     - `status` - describes the status of a `StreamingQuery` at this moment, including what phase of processing is currently happening and if data is available.
     - `recentProgress` - an array of statistics about the most recent microbatches that have executed.
    
    A recent progress contains the following information:
    ```
    {
      "id" : "2be8670a-fce1-4859-a530-748f29553bb6",
      "name" : "query-29",
      "timestamp" : 1479705392724,
      "inputRowsPerSecond" : 230.76923076923077,
      "processedRowsPerSecond" : 10.869565217391303,
      "durationMs" : {
        "triggerExecution" : 276,
        "queryPlanning" : 3,
        "getBatch" : 5,
        "getOffset" : 3,
        "addBatch" : 234,
        "walCommit" : 30
      },
      "currentWatermark" : 0,
      "stateOperators" : [ ],
      "sources" : [ {
        "description" : "KafkaSource[Subscribe[topic-14]]",
        "startOffset" : {
          "topic-14" : {
            "2" : 0,
            "4" : 1,
            "1" : 0,
            "3" : 0,
            "0" : 0
          }
        },
        "endOffset" : {
          "topic-14" : {
            "2" : 1,
            "4" : 2,
            "1" : 0,
            "3" : 0,
            "0" : 1
          }
        },
        "numRecords" : 3,
        "inputRowsPerSecond" : 230.76923076923077,
        "processedRowsPerSecond" : 10.869565217391303
      } ]
    }
    ```
    
    Additionally, in order to make it possible to correlate progress updates across restarts, we change the `id` field from an integer that is unique with in the JVM to a `UUID` that is globally unique.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #15954 from marmbrus/queryProgress.
    c3d08e2f
    [SPARK-18516][SQL] Split state and progress in streaming
    Tathagata Das authored
    This PR separates the status of a `StreamingQuery` into two separate APIs:
     - `status` - describes the status of a `StreamingQuery` at this moment, including what phase of processing is currently happening and if data is available.
     - `recentProgress` - an array of statistics about the most recent microbatches that have executed.
    
    A recent progress contains the following information:
    ```
    {
      "id" : "2be8670a-fce1-4859-a530-748f29553bb6",
      "name" : "query-29",
      "timestamp" : 1479705392724,
      "inputRowsPerSecond" : 230.76923076923077,
      "processedRowsPerSecond" : 10.869565217391303,
      "durationMs" : {
        "triggerExecution" : 276,
        "queryPlanning" : 3,
        "getBatch" : 5,
        "getOffset" : 3,
        "addBatch" : 234,
        "walCommit" : 30
      },
      "currentWatermark" : 0,
      "stateOperators" : [ ],
      "sources" : [ {
        "description" : "KafkaSource[Subscribe[topic-14]]",
        "startOffset" : {
          "topic-14" : {
            "2" : 0,
            "4" : 1,
            "1" : 0,
            "3" : 0,
            "0" : 0
          }
        },
        "endOffset" : {
          "topic-14" : {
            "2" : 1,
            "4" : 2,
            "1" : 0,
            "3" : 0,
            "0" : 1
          }
        },
        "numRecords" : 3,
        "inputRowsPerSecond" : 230.76923076923077,
        "processedRowsPerSecond" : 10.869565217391303
      } ]
    }
    ```
    
    Additionally, in order to make it possible to correlate progress updates across restarts, we change the `id` field from an integer that is unique with in the JVM to a `UUID` that is globally unique.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #15954 from marmbrus/queryProgress.
Loading