-
- Downloads
[SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing
## What changes were proposed in this pull request? When reading file stream with non-globbing path, the results return data with all `null`s for the partitioned columns. E.g., case class A(id: Int, value: Int) val data = spark.createDataset(Seq( A(1, 1), A(2, 2), A(2, 3)) ) val url = "/tmp/test" data.write.partitionBy("id").parquet(url) spark.read.parquet(url).show +-----+---+ |value| id| +-----+---+ | 2| 2| | 3| 2| | 1| 1| +-----+---+ val s = spark.readStream.schema(spark.read.load(url).schema).parquet(url) s.writeStream.queryName("test").format("memory").start() sql("SELECT * FROM test").show +-----+----+ |value| id| +-----+----+ | 2|null| | 3|null| | 1|null| +-----+----+ ## How was this patch tested? Jenkins tests. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #14803 from viirya/filestreamsource-option.
Showing
- docs/structured-streaming-programming-guide.md 6 additions, 0 deletionsdocs/structured-streaming-programming-guide.md
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 6 additions, 1 deletion...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala 8 additions, 1 deletion...ache/spark/sql/execution/streaming/FileStreamSource.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 82 additions, 1 deletion...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala 8 additions, 0 deletions...est/scala/org/apache/spark/sql/streaming/StreamTest.scala
Loading
Please register or sign in to comment