-
- Downloads
[SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch
## What changes were proposed in this pull request? In `FileStreamSource.getBatch`, we will create a `DataSource` with specified schema, to avoid inferring the schema again and again. However, we don't pass the partition columns, and will infer the partition again and again. This PR fixes it by keeping the partition columns in `FileStreamSource`, like schema. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #15581 from cloud-fan/stream.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 18 additions, 8 deletions...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala 2 additions, 0 deletions...ache/spark/sql/execution/streaming/FileStreamSource.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceSuite.scala 1 addition, 1 deletion...spark/sql/execution/streaming/FileStreamSourceSuite.scala
Please register or sign in to comment