-
- Downloads
[SPARK-16463][SQL] Support `truncate` option in Overwrite mode for JDBC DataFrameWriter
## What changes were proposed in this pull request? This PR adds a boolean option, `truncate`, for `SaveMode.Overwrite` of JDBC DataFrameWriter. If this option is `true`, it try to take advantage of `TRUNCATE TABLE` instead of `DROP TABLE`. This is a trivial option, but will provide great **convenience** for BI tool users based on RDBMS tables generated by Spark. **Goal** - Without `CREATE/DROP` privilege, we can save dataframe to database. Sometime these are not allowed for security. - It will preserve the existing table information, so users can add and keep some additional `INDEX` and `CONSTRAINT`s for the table. - Sometime, `TRUNCATE` is faster than the combination of `DROP/CREATE`. **Supported DBMS** The following is `truncate`-option support table. Due to the different behavior of `TRUNCATE TABLE` among DBMSs, it's not always safe to use `TRUNCATE TABLE`. Spark will ignore the `truncate` option for **unknown** and **some** DBMS with **default CASCADING** behavior. Newly added JDBCDialect should implement corresponding function to support `truncate` option additionally. Spark Dialects | `truncate` OPTION SUPPORT ---------------|------------------------------- MySQLDialect | O PostgresDialect | X DB2Dialect | O MsSqlServerDialect | O DerbyDialect | O OracleDialect | O **Before (TABLE with INDEX case)**: SparkShell & MySQL CLI are interleaved intentionally. ```scala scala> val (url, prop)=("jdbc:mysql://localhost:3306/temp?useSSL=false", new java.util.Properties) scala> prop.setProperty("user","root") scala> df.write.mode("overwrite").jdbc(url, "table_with_index", prop) scala> spark.range(10).write.mode("overwrite").jdbc(url, "table_with_index", prop) mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | | NULL | | +-------+------------+------+-----+---------+-------+ mysql> CREATE UNIQUE INDEX idx_id ON table_with_index(id); mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | PRI | NULL | | +-------+------------+------+-----+---------+-------+ scala> spark.range(10).write.mode("overwrite").jdbc(url, "table_with_index", prop) mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | | NULL | | +-------+------------+------+-----+---------+-------+ ``` **After (TABLE with INDEX case)** ```scala scala> spark.range(10).write.mode("overwrite").option("truncate", true).jdbc(url, "table_with_index", prop) mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | PRI | NULL | | +-------+------------+------+-----+---------+-------+ ``` **Error Handling** - In case of exceptions, Spark will not retry. Users should turn off the `truncate` option. - In case of schema change: - If one of the column names changes, this will raise exceptions intuitively. - If there exists only type difference, this will work like Append mode. ## How was this patch tested? Pass the Jenkins tests with a updated testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14086 from dongjoon-hyun/SPARK-16410.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 16 additions, 2 deletions...src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala 16 additions, 0 deletions...ache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
- sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala 2 additions, 0 deletions...src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
- sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala 7 additions, 0 deletions...c/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
- sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala 2 additions, 0 deletions.../scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
- sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala 2 additions, 0 deletions...c/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
- sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala 2 additions, 0 deletions.../main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala
- sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala 2 additions, 0 deletions...ain/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
- sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala 21 additions, 2 deletions...test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala
Loading
Please register or sign in to comment