-
- Downloads
[SPARK-18243][SQL] Port Hive writing to use FileFormat interface
## What changes were proposed in this pull request? Inserting data into Hive tables has its own implementation that is distinct from data sources: `InsertIntoHiveTable`, `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. Note that one other major difference is that data source tables write directly to the final destination without using some staging directory, and then Spark itself adds the partitions/tables to the catalog. Hive tables actually write to some staging directory, and then call Hive metastore's loadPartition/loadTable function to load those data in. So we still need to keep `InsertIntoHiveTable` to put this special logic. In the future, we should think of writing to the hive table location directly, so that we don't need to call `loadTable`/`loadPartition` at the end and remove `InsertIntoHiveTable`. This PR removes `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`, and create a `HiveFileFormat` to implement the write logic. In the future, we should also implement the read logic in `HiveFileFormat`. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes #16517 from cloud-fan/insert-hive.
Showing
- core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala 1 addition, 1 deletion...che/spark/internal/io/HadoopMapReduceCommitProtocol.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 14 additions, 19 deletions...scala/org/apache/spark/sql/execution/QueryExecution.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala 1 addition, 1 deletion...in/scala/org/apache/spark/sql/hive/HiveSessionState.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala 46 additions, 31 deletions...main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala 2 additions, 2 deletions...rc/main/scala/org/apache/spark/sql/hive/TableReader.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala 1 addition, 1 deletion...main/scala/org/apache/spark/sql/hive/client/package.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala 149 additions, 0 deletions.../org/apache/spark/sql/hive/execution/HiveFileFormat.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala 85 additions, 102 deletions...apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala 1 addition, 1 deletion...e/spark/sql/hive/execution/ScriptTransformationExec.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala 0 additions, 356 deletions...cala/org/apache/spark/sql/hive/hiveWriterContainers.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala 2 additions, 2 deletions...cala/org/apache/spark/sql/hive/client/VersionsSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala 4 additions, 6 deletions.../apache/spark/sql/hive/execution/HiveComparisonTest.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 3 additions, 2 deletions...la/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala 4 additions, 4 deletions.../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala 5 additions, 5 deletions.../spark/sql/hive/execution/ScriptTransformationSuite.scala
Loading
Please register or sign in to comment