-
- Downloads
[SPARK-13149][SQL] Add FileStreamSource
`FileStreamSource` is an implementation of `org.apache.spark.sql.execution.streaming.Source`. It takes advantage of the existing `HadoopFsRelationProvider` to support various file formats. It remembers files in each batch and stores it into the metadata files so as to recover them when restarting. The metadata files are stored in the file system. There will be a further PR to clean up the metadata files periodically. This is based on the initial work from marmbrus. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11034 from zsxwing/stream-df-file-source.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala 1 addition, 1 deletion.../spark/sql/execution/datasources/ResolvedDataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala 240 additions, 0 deletions...ache/spark/sql/execution/streaming/FileStreamSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala 29 additions, 4 deletions.../main/scala/org/apache/spark/sql/sources/interfaces.scala
- sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala 2 additions, 0 deletions...core/src/test/scala/org/apache/spark/sql/StreamTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/DataFrameReaderWriterSuite.scala 3 additions, 2 deletions...ache/spark/sql/streaming/DataFrameReaderWriterSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 435 additions, 0 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
Loading
Please register or sign in to comment