-
- Downloads
[SPARK-15895][SQL] Filters out metadata files while doing partition discovery
## What changes were proposed in this pull request? Take the following directory layout as an example: ``` dir/ +- p0=0/ |-_metadata +- p1=0/ |-part-00001.parquet |-part-00002.parquet |-... ``` The `_metadata` file under `p0=0` shouldn't fail partition discovery. This PR filters output all metadata files whose names start with `_` while doing partition discovery. ## How was this patch tested? New unit test added in `ParquetPartitionDiscoverySuite`. Author: Cheng Lian <lian@databricks.com> Closes #13623 from liancheng/spark-15895-partition-disco-no-metafiles.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala 4 additions, 3 deletions.../spark/sql/execution/datasources/ListingFileCatalog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala 12 additions, 3 deletions.../execution/datasources/PartitioningAwareFileCatalog.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala 44 additions, 0 deletions.../datasources/parquet/ParquetPartitionDiscoverySuite.scala
Loading
Please register or sign in to comment