Skip to content
Snippets Groups Projects
Commit 7bc3e9ba authored by Wenchen Fan's avatar Wenchen Fan
Browse files

[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when...

[SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor the error checking when append data to an existing table

## What changes were proposed in this pull request?

When we append data to an existing table with `DataFrameWriter.saveAsTable`, we will do various checks to make sure the appended data is consistent with the existing data.

However, we get the information of the existing table by matching the table relation, instead of looking at the table metadata. This is error-prone, e.g. we only check the number of columns for `HadoopFsRelation`, we forget to check bucketing, etc.

This PR refactors the error checking by looking at the metadata of the existing table, and fix several bugs:
* SPARK-18899: We forget to check if the specified bucketing matched the existing table, which may lead to a problematic table that has different bucketing in different data files.
* SPARK-18912: We forget to check the number of columns for non-file-based data source table
* SPARK-18913: We don't support append data to a table with special column names.

## How was this patch tested?
new regression test.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #16313 from cloud-fan/bug1.

(cherry picked from commit f923c849)
Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
parent 4cff0b50
No related branches found
No related tags found
No related merge requests found
Showing with 180 additions and 91 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment