-
- Downloads
[SPARK-12539][SQL] support writing bucketed table
This PR adds bucket write support to Spark SQL. User can specify bucketing columns, numBuckets and sorting columns with or without partition columns. For example: ``` df.write.partitionBy("year").bucketBy(8, "country").sortBy("amount").saveAsTable("sales") ``` When bucketing is used, we will calculate bucket id for each record, and group the records by bucket id. For each group, we will create a file with bucket id in its name, and write data into it. For each bucket file, if sorting columns are specified, the data will be sorted before write. Note that there may be multiply files for one bucket, as the data is distributed. Currently we store the bucket metadata at hive metastore in a non-hive-compatible way. We use different bucketing hash function compared to hive, so we can't be compatible anyway. Limitations: * Can't write bucketed data without hive metastore. * Can't insert bucketed data into existing hive tables. Author: Wenchen Fan <wenchen@databricks.com> Closes #10498 from cloud-fan/bucket-write.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 80 additions, 9 deletions...src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 3 additions, 4 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DDLParser.scala 1 addition, 0 deletions...rg/apache/spark/sql/execution/datasources/DDLParser.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala 1 addition, 1 deletion...ql/execution/datasources/InsertIntoHadoopFsRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala 2 additions, 0 deletions.../spark/sql/execution/datasources/ResolvedDataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala 157 additions, 62 deletions...che/spark/sql/execution/datasources/WriterContainer.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/bucket.scala 57 additions, 0 deletions...a/org/apache/spark/sql/execution/datasources/bucket.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala 9 additions, 1 deletion...cala/org/apache/spark/sql/execution/datasources/ddl.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONRelation.scala 29 additions, 6 deletions...e/spark/sql/execution/datasources/json/JSONRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala 19 additions, 9 deletions...k/sql/execution/datasources/parquet/ParquetRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 17 additions, 7 deletions...la/org/apache/spark/sql/execution/datasources/rules.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala 31 additions, 3 deletions.../main/scala/org/apache/spark/sql/sources/interfaces.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 22 additions, 1 deletion...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala 3 additions, 4 deletions...main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala 12 additions, 3 deletions.../scala/org/apache/spark/sql/hive/execution/commands.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala 13 additions, 7 deletions...ain/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala 1 addition, 0 deletions...org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala 169 additions, 0 deletions...ala/org/apache/spark/sql/sources/BucketedWriteSuite.scala
Loading
Please register or sign in to comment