-
- Downloads
[SPARK-13985][SQL] Deterministic batches with ids
This PR relaxes the requirements of a `Sink` for structured streaming to only require idempotent appending of data. Previously the `Sink` needed to be able to transactionally append data while recording an opaque offset indicated how far in a stream we have processed. In order to do this, a new write-ahead-log has been added to stream execution, which records the offsets that will are present in each batch. The log is created in the newly added `checkpointLocation`, which defaults to `${spark.sql.streaming.checkpointLocation}/${queryName}` but can be overriden by setting `checkpointLocation` in `DataFrameWriter`. In addition to making sinks easier to write the addition of batchIds and a checkpoint location is done in anticipation of integration with the the `StateStore` (#11645). Author: Michael Armbrust <michael@databricks.com> Closes #11804 from marmbrus/batchIds.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/ContinuousQueryManager.scala 6 additions, 2 deletions...n/scala/org/apache/spark/sql/ContinuousQueryManager.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 10 additions, 1 deletion...src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/SinkStatus.scala 1 addition, 1 deletion...core/src/main/scala/org/apache/spark/sql/SinkStatus.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 2 additions, 1 deletion...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompositeOffset.scala 12 additions, 0 deletions...pache/spark/sql/execution/streaming/CompositeOffset.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala 11 additions, 13 deletions...ache/spark/sql/execution/streaming/FileStreamSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala 6 additions, 1 deletion...pache/spark/sql/execution/streaming/HDFSMetadataLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Sink.scala 9 additions, 21 deletions...scala/org/apache/spark/sql/execution/streaming/Sink.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala 8 additions, 2 deletions...ala/org/apache/spark/sql/execution/streaming/Source.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala 132 additions, 61 deletions...pache/spark/sql/execution/streaming/StreamExecution.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala 14 additions, 38 deletions...apache/spark/sql/execution/streaming/StreamProgress.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala 37 additions, 48 deletions...ala/org/apache/spark/sql/execution/streaming/memory.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 7 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala 10 additions, 8 deletions...core/src/test/scala/org/apache/spark/sql/StreamTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala 7 additions, 1 deletion...che/spark/sql/streaming/ContinuousQueryManagerSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQuerySuite.scala 6 additions, 5 deletions...org/apache/spark/sql/streaming/ContinuousQuerySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/DataFrameReaderWriterSuite.scala 38 additions, 17 deletions...ache/spark/sql/streaming/DataFrameReaderWriterSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 0 additions, 18 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/util/ContinuousQueryListenerSuite.scala 3 additions, 3 deletions.../apache/spark/sql/util/ContinuousQueryListenerSuite.scala
Loading
Please register or sign in to comment