-
- Downloads
[SPARK-14288][SQL] Memory Sink for streaming
This PR exposes the internal testing `MemorySink` though the data source API. This will allow users to easily test streaming applications in the Spark shell or other local tests. Usage: ```scala inputStream.write .format("memory") .queryName("memStream") .startStream() // Now you can query the result of the stream here. sqlContext.table("memStream") ``` The most complicated part of the logic is choosing the checkpoint directory. There are a few requirements we are attempting to satisfy here: - when working in the shell locally, it should just work with no extra configuration. - when working on a cluster you should be able to make it easily create the checkpoint on a distributed file system so you can test aggregation (state checkpoints are also stored in this directory and must be accessible from workers). - it should be clear that you can't resume since the data is just in memory. The chosen algorithm proceeds as follows: - the user gives a checkpoint directory, use it - if the conf has a checkpoint location, use `$location/$queryName` - if neither, create a local directory - always check to make sure there are no offsets written to the directory Author: Michael Armbrust <michael@databricks.com> Closes #12119 from marmbrus/memorySink.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 61 additions, 18 deletions...src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 6 additions, 0 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala 8 additions, 0 deletions...ala/org/apache/spark/sql/execution/streaming/memory.scala
- sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala 2 additions, 0 deletionssql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala 82 additions, 0 deletions...cala/org/apache/spark/sql/streaming/MemorySinkSuite.scala
Please register or sign in to comment