-
- Downloads
[SPARK-18635][SQL] Partition name/values not escaped correctly in some cases
## What changes were proposed in this pull request? Due to confusion between URI vs paths, in certain cases we escape partition values too many times, which causes some Hive client operations to fail or write data to the wrong location. This PR fixes at least some of these cases. To my understanding this is how values, filesystem paths, and URIs interact. - Hive stores raw (unescaped) partition values that are returned to you directly when you call listPartitions. - Internally, we convert these raw values to filesystem paths via `ExternalCatalogUtils.[un]escapePathName`. - In some circumstances we store URIs instead of filesystem paths. When a path is converted to a URI via `path.toURI`, the escaped partition values are further URI-encoded. This means that to get a path back from a URI, you must call `new Path(new URI(uriTxt))` in order to decode the URI-encoded string. - In `CatalogStorageFormat` we store URIs as strings. This makes it easy to forget to URI-decode the value before converting it into a path. - Finally, the Hive client itself uses mostly Paths for representing locations, and only URIs occasionally. In the future we should probably clean this up, perhaps by dropping use of URIs when unnecessary. We should also try fixing escaping for partition names as well as values, though names are unlikely to contain special characters. cc mallman cloud-fan yhuai ## How was this patch tested? Unit tests. Author: Eric Liang <ekl@databricks.com> Closes #16071 from ericl/spark-18635.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala 3 additions, 0 deletions...ala/org/apache/spark/sql/catalyst/catalog/interface.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 3 additions, 2 deletions...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala 4 additions, 2 deletions...ain/scala/org/apache/spark/sql/hive/client/HiveShim.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala 54 additions, 0 deletions.../spark/sql/hive/PartitionProviderCompatibilitySuite.scala
Loading
Please register or sign in to comment