-
- Downloads
[SPARK-10820][SQL] Support for the continuous execution of structured queries
This is a follow up to 9aadcffa that extends Spark SQL to allow users to _repeatedly_ optimize and execute structured queries. A `ContinuousQuery` can be expressed using SQL, DataFrames or Datasets. The purpose of this PR is only to add some initial infrastructure which will be extended in subsequent PRs. ## User-facing API - `sqlContext.streamFrom` and `df.streamTo` return builder objects that are analogous to the `read/write` interfaces already available to executing queries in a batch-oriented fashion. - `ContinuousQuery` provides an interface for interacting with a query that is currently executing in the background. ## Internal Interfaces - `StreamExecution` - executes streaming queries in micro-batches The following are currently internal, but public APIs will be provided in a future release. - `Source` - an interface for providers of continually arriving data. A source must have a notion of an `Offset` that monotonically tracks what data has arrived. For fault tolerance, a source must be able to replay data given a start offset. - `Sink` - an interface that accepts the results of a continuously executing query. Also responsible for tracking the offset that should be resumed from in the case of a failure. ## Testing - `MemoryStream` and `MemorySink` - simple implementations of source and sink that keep all data in memory and have methods for simulating durability failures - `StreamTest` - a framework for performing actions and checking invariants on a continuous query Author: Michael Armbrust <michael@databricks.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Josh Rosen <rosenville@gmail.com> Closes #11006 from marmbrus/structured-streaming.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/ContinuousQuery.scala 30 additions, 0 deletions...src/main/scala/org/apache/spark/sql/ContinuousQuery.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala 8 additions, 0 deletionssql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataStreamReader.scala 127 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/DataStreamReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataStreamWriter.scala 134 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/DataStreamWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 8 additions, 0 deletions...core/src/main/scala/org/apache/spark/sql/SQLContext.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala 32 additions, 1 deletion.../spark/sql/execution/datasources/ResolvedDataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Batch.scala 26 additions, 0 deletions...cala/org/apache/spark/sql/execution/streaming/Batch.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompositeOffset.scala 67 additions, 0 deletions...pache/spark/sql/execution/streaming/CompositeOffset.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/LongOffset.scala 33 additions, 0 deletions...org/apache/spark/sql/execution/streaming/LongOffset.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.scala 37 additions, 0 deletions...ala/org/apache/spark/sql/execution/streaming/Offset.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Sink.scala 47 additions, 0 deletions...scala/org/apache/spark/sql/execution/streaming/Sink.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala 36 additions, 0 deletions...ala/org/apache/spark/sql/execution/streaming/Source.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala 211 additions, 0 deletions...pache/spark/sql/execution/streaming/StreamExecution.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala 67 additions, 0 deletions...apache/spark/sql/execution/streaming/StreamProgress.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala 34 additions, 0 deletions...che/spark/sql/execution/streaming/StreamingRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala 138 additions, 0 deletions...ala/org/apache/spark/sql/execution/streaming/memory.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala 21 additions, 0 deletions.../main/scala/org/apache/spark/sql/sources/interfaces.scala
- sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala 44 additions, 30 deletionssql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala 346 additions, 0 deletions...core/src/test/scala/org/apache/spark/sql/StreamTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/DataStreamReaderSuite.scala 166 additions, 0 deletions...rg/apache/spark/sql/streaming/DataStreamReaderSuite.scala
Loading
Please register or sign in to comment