-
- Downloads
[SPARK-16975][SQL] Column-partition path starting '_' should be handled correctly
## What changes were proposed in this pull request? Currently, Spark ignores path names starting with underscore `_` and `.`. This causes read-failures for the column-partitioned file data sources whose partition column names starts from '_', e.g. `_col`. **Before** ```scala scala> spark.range(10).withColumn("_locality_code", $"id").write.partitionBy("_locality_code").save("/tmp/parquet") scala> spark.read.parquet("/tmp/parquet") org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /tmp/parquet20. It must be specified manually; ``` **After** ```scala scala> spark.range(10).withColumn("_locality_code", $"id").write.partitionBy("_locality_code").save("/tmp/parquet") scala> spark.read.parquet("/tmp/parquet") res2: org.apache.spark.sql.DataFrame = [id: bigint, _locality_code: int] ``` ## How was this patch tested? Pass the Jenkins with a new test case. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14585 from dongjoon-hyun/SPARK-16975-PARQUET.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala 1 addition, 1 deletion.../execution/datasources/PartitioningAwareFileCatalog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala 1 addition, 1 deletion...park/sql/execution/datasources/fileSourceInterfaces.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala 1 addition, 1 deletion...spark/sql/execution/datasources/json/JsonFileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala 2 additions, 1 deletion...sql/execution/datasources/parquet/ParquetFileFormat.scala
- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 9 additions, 0 deletions...e/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Loading
Please register or sign in to comment