-
- Downloads
[SPARK-17729][SQL] Enable creating hive bucketed tables
## What changes were proposed in this pull request? Hive allows inserting data to bucketed table without guaranteeing bucketed and sorted-ness based on these two configs : `hive.enforce.bucketing` and `hive.enforce.sorting`. What does this PR achieve ? - Spark will disallow users from writing outputs to hive bucketed tables by default (given that output won't adhere with Hive's semantics). - IF user still wants to write to hive bucketed table, the only resort is to use `hive.enforce.bucketing=false` and `hive.enforce.sorting=false` which means user does NOT care about bucketing guarantees. Changes done in this PR: - Extract table's bucketing information in `HiveClientImpl` - While writing table info to metastore, `HiveClientImpl` now populates the bucketing information in the hive `Table` object - `InsertIntoHiveTable` allows inserts to bucketed table only if both `hive.enforce.bucketing` and `hive.enforce.sorting` are `false` Ability to create bucketed tables will enable adding test cases to Spark while I add more changes related to hive bucketing support. Design doc for hive hive bucketing support : https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit# ## How was this patch tested? - Added test for creating bucketed and sorted table. - Added test to ensure that INSERTs fail if strict bucket / sort is enforced - Added test to ensure that INSERTs can go through if strict bucket / sort is NOT enforced - Added test to validate that bucketing information shows up in output of DESC FORMATTED - Added test to ensure that `SHOW CREATE TABLE` works for hive bucketed tables Author: Tejas Patil <tejasp@fb.com> Closes #17644 from tejasapatil/SPARK-17729_create_bucketed_table.
Showing
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala 1 addition, 1 deletion...che/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala 3 additions, 3 deletions...scala/org/apache/spark/sql/execution/SparkSqlParser.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 7 additions, 2 deletions...scala/org/apache/spark/sql/execution/command/tables.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 0 additions, 2 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 46 additions, 10 deletions...ala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala 21 additions, 0 deletions...apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala 26 additions, 7 deletions...scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala 47 additions, 0 deletions.../org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala 3 additions, 8 deletions...cala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 20 additions, 0 deletions...la/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Loading
Please register or sign in to comment