-
- Downloads
[SPARK-19257][SQL] location for table/partition/database should be java.net.URI
## What changes were proposed in this pull request? Currently we treat the location of table/partition/database as URI string. It will be safer if we can make the type of location as java.net.URI. In this PR, there are following classes changes: **1. CatalogDatabase** ``` case class CatalogDatabase( name: String, description: String, locationUri: String, properties: Map[String, String]) ---> case class CatalogDatabase( name: String, description: String, locationUri: URI, properties: Map[String, String]) ``` **2. CatalogStorageFormat** ``` case class CatalogStorageFormat( locationUri: Option[String], inputFormat: Option[String], outputFormat: Option[String], serde: Option[String], compressed: Boolean, properties: Map[String, String]) ----> case class CatalogStorageFormat( locationUri: Option[URI], inputFormat: Option[String], outputFormat: Option[String], serde: Option[String], compressed: Boolean, properties: Map[String, String]) ``` Before and After this PR, it is transparent for user, there is no change that the user should concern. The `String` to `URI` just happened in SparkSQL internally. Here list some operation related location: **1. whitespace in the location** e.g. `/a/b c/d` For both table location and partition location, After `CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b c/d'` , then `DESC EXTENDED t ` show the location is `/a/b c/d`, and the real path in the FileSystem also show `/a/b c/d` **2. colon(:) in the location** e.g. `/a/b:c/d` For both table location and partition location, when `CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b:c/d'` , **In linux file system** `DESC EXTENDED t ` show the location is `/a/b:c/d`, and the real path in the FileSystem also show `/a/b:c/d` **in HDFS** throw exception: `java.lang.IllegalArgumentException: Pathname /a/b:c/d from hdfs://iZbp1151s8hbnnwriekxdeZ:9000/a/b:c/d is not a valid DFS filename.` **while** After `INSERT INTO TABLE t PARTITION(a="a:b") SELECT 1` then `DESC EXTENDED t ` show the location is `/xxx/a=a%3Ab`, and the real path in the FileSystem also show `/xxx/a=a%3Ab` **3. percent sign(%) in the location** e.g. `/a/b%c/d` For both table location and partition location, After `CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b%c/d'` , then `DESC EXTENDED t ` show the location is `/a/b%c/d`, and the real path in the FileSystem also show `/a/b%c/d` **4. encoded(%25) in the location** e.g. `/a/b%25c/d` For both table location and partition location, After `CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b%25c/d'` , then `DESC EXTENDED t ` show the location is `/a/b%25c/d`, and the real path in the FileSystem also show `/a/b%25c/d` **while** After `INSERT INTO TABLE t PARTITION(a="%25") SELECT 1` then `DESC EXTENDED t ` show the location is `/xxx/a=%2525`, and the real path in the FileSystem also show `/xxx/a=%2525` **Additionally**, except the location, there are two other factors will affect the location of the table/partition. one is the table name which does not allowed to have special characters, and the other is `partition name` which have the same actions with `partition value`, and `partition name` with special character situation has add some testcase and resolve a bug in [PR](https://github.com/apache/spark/pull/17173) ### Summary: After `CREATE TABLE t... (PARTITIONED BY ...) LOCATION path`, the path which we get from `DESC TABLE` and `real path in FileSystem` are all the same with the `CREATE TABLE` command(different filesystem has different action that allow what kind of special character to create the path, e.g. HDFS does not allow colon, but linux filesystem allow it ). `DataBase` also have the same logic with `CREATE TABLE` while if the `partition value` has some special character like `%` `:` `#` etc, then we will get the path with encoded `partition value` like `/xxx/a=A%25B` from `DESC TABLE` and `real path in FileSystem` In this PR, the core change code is using `new Path(str).toUri` and `new Path(uri).toString` which transfrom `str to uri `or `uri to str`. for example: ``` val str = '/a/b c/d' val uri = new Path(str).toUri --> '/a/b%20c/d' val strFromUri = new Path(uri).toString -> '/a/b c/d' ``` when we restore table/partition from metastore, or get the location from `CREATE TABLE` command, we can use it as above to change string to uri `new Path(str).toUri ` ## How was this patch tested? unit test added. The `current master branch` also `passed all the test cases` added in this PR by a litter change. https://github.com/apache/spark/pull/17149/files#diff-b7094baa12601424a5d19cb930e3402fR1764 here `toURI` -> `toString` when test in master branch. This can show that this PR is transparent for user. Author: windpiger <songjun@outlook.com> Closes #17149 from windpiger/changeStringToURI.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala 26 additions, 0 deletions...che/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala 6 additions, 6 deletions...g/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala 8 additions, 7 deletions...rg/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala 6 additions, 8 deletions...ala/org/apache/spark/sql/catalyst/catalog/interface.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala 10 additions, 8 deletions...che/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala 5 additions, 3 deletions...scala/org/apache/spark/sql/execution/SparkSqlParser.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala 7 additions, 3 deletions.../spark/sql/execution/command/createDataSourceTables.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 8 additions, 6 deletions...in/scala/org/apache/spark/sql/execution/command/ddl.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 7 additions, 4 deletions...scala/org/apache/spark/sql/execution/command/tables.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala 3 additions, 1 deletion...he/spark/sql/execution/datasources/CatalogFileIndex.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 3 additions, 2 deletions...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala 4 additions, 2 deletions.../spark/sql/execution/datasources/DataSourceStrategy.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala 3 additions, 1 deletion...ain/scala/org/apache/spark/sql/internal/CatalogImpl.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala 5 additions, 1 deletion...ain/scala/org/apache/spark/sql/internal/SharedState.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandSuite.scala 5 additions, 3 deletions.../apache/spark/sql/execution/command/DDLCommandSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 107 additions, 29 deletions...ala/org/apache/spark/sql/execution/command/DDLSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala 2 additions, 2 deletions...st/scala/org/apache/spark/sql/internal/CatalogSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala 1 addition, 1 deletion...ala/org/apache/spark/sql/sources/BucketedWriteSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/PathOptionSuite.scala 8 additions, 4 deletions.../scala/org/apache/spark/sql/sources/PathOptionSuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 11 additions, 10 deletions...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
Loading
Please register or sign in to comment