-
- Downloads
[SPARK-14954] [SQL] Add PARTITION BY and BUCKET BY clause for data source CTAS syntax
Currently, we can only create persisted partitioned and/or bucketed data source tables using the Dataset API but not using SQL DDL. This PR implements the following syntax to add partitioning and bucketing support to the SQL DDL: ``` CREATE TABLE <table-name> USING <provider> [OPTIONS (<key1> <value1>, <key2> <value2>, ...)] [PARTITIONED BY (col1, col2, ...)] [CLUSTERED BY (col1, col2, ...) [SORTED BY (col1, col2, ...)] INTO <n> BUCKETS] AS SELECT ... ``` Test cases are added in `MetastoreDataSourcesSuite` to check the newly added syntax. Author: Cheng Lian <lian@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12734 from liancheng/spark-14954.
Showing
- sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 3 additions, 1 deletion...in/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala 10 additions, 2 deletions...scala/org/apache/spark/sql/execution/SparkSqlParser.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala 93 additions, 0 deletions...org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
Please register or sign in to comment