-
- Downloads
[SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there...
[SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there is no partitioning scheme in the given paths ## What changes were proposed in this pull request? Lets says there are json files in the following directories structure ``` xyz/file0.json xyz/subdir1/file1.json xyz/subdir2/file2.json xyz/subdir1/subsubdir1/file3.json ``` `sqlContext.read.json("xyz")` should read only file0.json according to behavior in Spark 1.6.1. However in current master, all the 4 files are read. The fix is to make FileCatalog return only the children files of the given path if there is not partitioning detected (instead of all the recursive list of files). Closes #12774 ## How was this patch tested? unit tests Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #12856 from tdas/SPARK-14997.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala 23 additions, 1 deletion.../execution/datasources/PartitioningAwareFileCatalog.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileCatalogSuite.scala 68 additions, 0 deletions...he/spark/sql/execution/datasources/FileCatalogSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala 47 additions, 0 deletions.../datasources/parquet/ParquetPartitionDiscoverySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 10 additions, 5 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala 208 additions, 24 deletions...a/org/apache/spark/sql/sources/HadoopFsRelationTest.scala
Loading
Please register or sign in to comment