-
- Downloads
[SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across restart and...
[SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across restart and not auto-generate StreamingQuery.name ## What changes were proposed in this pull request? Here are the major changes in this PR. - Added the ability to recover `StreamingQuery.id` from checkpoint location, by writing the id to `checkpointLoc/metadata`. - Added `StreamingQuery.runId` which is unique for every query started and does not persist across restarts. This is to identify each restart of a query separately (same as earlier behavior of `id`). - Removed auto-generation of `StreamingQuery.name`. The purpose of name was to have the ability to define an identifier across restarts, but since id is precisely that, there is no need for a auto-generated name. This means name becomes purely cosmetic, and is null by default. - Added `runId` to `StreamingQueryListener` events and `StreamingQueryProgress`. Implementation details - Renamed existing `StreamExecutionMetadata` to `OffsetSeqMetadata`, and moved it to the file `OffsetSeq.scala`, because that is what this metadata is tied to. Also did some refactoring to make the code cleaner (got rid of a lot of `.json` and `.getOrElse("{}")`). - Added the `id` as the new `StreamMetadata`. - When a StreamingQuery is created it gets or writes the `StreamMetadata` from `checkpointLoc/metadata`. - All internal logging in `StreamExecution` uses `(name, id, runId)` instead of just `name` TODO - [x] Test handling of name=null in json generation of StreamingQueryProgress - [x] Test handling of name=null in json generation of StreamingQueryListener events - [x] Test python API of runId ## How was this patch tested? Updated unit tests and new unit tests Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #16113 from tdas/SPARK-18657.
Showing
- project/MimaExcludes.scala 3 additions, 0 deletionsproject/MimaExcludes.scala
- python/pyspark/sql/streaming.py 17 additions, 2 deletionspython/pyspark/sql/streaming.py
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala 25 additions, 2 deletions.../org/apache/spark/sql/execution/streaming/OffsetSeq.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala 1 addition, 1 deletion...g/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala 4 additions, 2 deletions...ache/spark/sql/execution/streaming/ProgressReporter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala 48 additions, 57 deletions...pache/spark/sql/execution/streaming/StreamExecution.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala 88 additions, 0 deletions...apache/spark/sql/execution/streaming/StreamMetadata.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala 1 addition, 1 deletion...apache/spark/sql/execution/streaming/StreamProgress.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala 15 additions, 4 deletions...scala/org/apache/spark/sql/streaming/StreamingQuery.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala 8 additions, 2 deletions...g/apache/spark/sql/streaming/StreamingQueryListener.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala 16 additions, 9 deletions...rg/apache/spark/sql/streaming/StreamingQueryManager.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala 5 additions, 2 deletions.../main/scala/org/apache/spark/sql/streaming/progress.scala
- sql/core/src/test/resources/structured-streaming/query-metadata-logs-version-2.1.0.txt 3 additions, 0 deletions...tructured-streaming/query-metadata-logs-version-2.1.0.txt
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLogSuite.scala 11 additions, 2 deletions...che/spark/sql/execution/streaming/OffsetSeqLogSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/StreamMetadataSuite.scala 55 additions, 0 deletions...e/spark/sql/execution/streaming/StreamMetadataSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala 29 additions, 17 deletions...che/spark/sql/streaming/StreamingQueryListenerSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala 67 additions, 11 deletions.../sql/streaming/StreamingQueryStatusAndProgressSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala 70 additions, 30 deletions.../org/apache/spark/sql/streaming/StreamingQuerySuite.scala
Loading
Please register or sign in to comment