-
- Downloads
[SPARK-19359][SQL] renaming partition should not leave useless directories
## What changes were proposed in this pull request? Hive metastore is not case-preserving and keep partition columns with lower case names. If Spark SQL creates a table with upper-case partition column names using `HiveExternalCatalog`, when we rename partition, it first calls the HiveClient to renamePartition, which will create a new lower case partition path, then Spark SQL renames the lower case path to upper-case. However, when we rename a nested path, different file systems have different behaviors. e.g. in jenkins, renaming `a=1/b=2` to `A=2/B=2` will success, but leave an empty directory `a=1`. in mac os, the renaming doesn't work as expected and result to `a=1/B=2`. This PR renames the partition directory recursively from the first partition column in `HiveExternalCatalog`, to be most compatible with different file systems. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes #16837 from cloud-fan/partition.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala 10 additions, 7 deletions...che/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 45 additions, 8 deletions...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala 29 additions, 0 deletions.../spark/sql/hive/PartitionProviderCompatibilitySuite.scala
Please register or sign in to comment