Skip to content
Snippets Groups Projects
Commit 489845f3 authored by Eric Liang's avatar Eric Liang Committed by Reynold Xin
Browse files

[SPARK-18145] Update documentation for hive partition management in 2.1

## What changes were proposed in this pull request?

This documents the partition handling changes for Spark 2.1 and how to migrate existing tables.

## How was this patch tested?

Built docs locally.

rxin

Author: Eric Liang <ekl@databricks.com>

Closes #16074 from ericl/spark-18145.
parent af9789a4
No related branches found
No related tags found
No related merge requests found
......@@ -1331,6 +1331,15 @@ options.
# Migration Guide
## Upgrading From Spark SQL 2.0 to 2.1
- Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API.
- Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance.
- To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table.
- Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables.
- In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten.
- Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data.
## Upgrading From Spark SQL 1.6 to 2.0
- `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment