Commit 7bc3e9ba authored 8 years ago by Wenchen Fan

[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when...

[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when append data to an existing table

## What changes were proposed in this pull request?

When we append data to an existing table with `DataFrameWriter.saveAsTable`, we will do various checks to make sure the appended data is consistent with the existing data.

However, we get the information of the existing table by matching the table relation, instead of looking at the table metadata. This is error-prone, e.g. we only check the number of columns for `HadoopFsRelation`, we forget to check bucketing, etc.

This PR refactors the error checking by looking at the metadata of the existing table, and fix several bugs:
* SPARK-18899: We forget to check if the specified bucketing matched the existing table, which may lead to a problematic table that has different bucketing in different data files.
* SPARK-18912: We forget to check the number of columns for non-file-based data source table
* SPARK-18913: We don't support append data to a table with special column names.

## How was this patch tested?
new regression test.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #16313 from cloud-fan/bug1.

(cherry picked from commit f923c849)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

parent 4cff0b50

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 180 additions and 91 deletions

Please register or to comment