[SPARK-18145] Update documentation for hive partition management in 2.1

## What changes were proposed in this pull request? This documents the partition handling changes for Spark 2.1 and how to migrate existing tables. ## How was this patch tested? Built docs locally. rxin Author: Eric Liang <ekl@databricks.com> Closes #16074 from ericl/spark-18145.

[SPARK-18145] Update documentation for hive partition management in 2.1
489845f3 · Eric Liang · Reynold Xin · af9789a4 · 489845f3
Commit 489845f3 authored 8 years ago by Eric Liang Committed by Reynold Xin 8 years ago
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1331,6 +1331,15 @@ options.

 # Migration Guide

+## Upgrading From Spark SQL 2.0 to 2.1
+
+ - Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API.
+    - Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance.
+    - To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table.
+ - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables.
+    - In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten.
+    - Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data.
+
 ## Upgrading From Spark SQL 1.6 to 2.0

 - `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and