-
- Downloads
[SPARK-18087][SQL] Optimize insert to not require REPAIR TABLE
## What changes were proposed in this pull request? When inserting into datasource tables with partitions managed by the hive metastore, we need to notify the metastore of newly added partitions. Previously this was implemented via `msck repair table`, but this is more expensive than needed. This optimizes the insertion path to add only the updated partitions. ## How was this patch tested? Existing tests (I verified manually that tests fail if the repair operation is omitted). Author: Eric Liang <ekl@databricks.com> Closes #15633 from ericl/spark-18087.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 1 addition, 1 deletion...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala 17 additions, 10 deletions.../spark/sql/execution/datasources/DataSourceStrategy.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala 2 additions, 1 deletion...ution/datasources/InsertIntoHadoopFsRelationCommand.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala 12 additions, 0 deletions...e/spark/sql/execution/datasources/PartitioningUtils.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala 20 additions, 9 deletions.../apache/spark/sql/execution/datasources/WriteOutput.scala
Loading
Please register or sign in to comment