-
- Downloads
[SPARK-18538][SQL][BACKPORT-2.1] Fix Concurrent Table Fetching Using DataFrameReader JDBC APIs
### What changes were proposed in this pull request? #### This PR is to backport https://github.com/apache/spark/pull/15975 to Branch 2.1 --- The following two `DataFrameReader` JDBC APIs ignore the user-specified parameters of parallelism degree. ```Scala def jdbc( url: String, table: String, columnName: String, lowerBound: Long, upperBound: Long, numPartitions: Int, connectionProperties: Properties): DataFrame ``` ```Scala def jdbc( url: String, table: String, predicates: Array[String], connectionProperties: Properties): DataFrame ``` This PR is to fix the issues. To verify the behavior correctness, we improve the plan output of `EXPLAIN` command by adding `numPartitions` in the `JDBCRelation` node. Before the fix, ``` == Physical Plan == *Scan JDBCRelation(TEST.PEOPLE) [NAME#1896,THEID#1897] ReadSchema: struct<NAME:string,THEID:int> ``` After the fix, ``` == Physical Plan == *Scan JDBCRelation(TEST.PEOPLE) [numPartitions=3] [NAME#1896,THEID#1897] ReadSchema: struct<NAME:string,THEID:int> ``` ### How was this patch tested? Added the verification logics on all the test cases for JDBC concurrent fetching. Author: gatorsmile <gatorsmile@gmail.com> Closes #16111 from gatorsmile/jdbcFix2.1.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 19 additions, 18 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala 2 additions, 1 deletion...e/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
- sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 48 additions, 19 deletions.../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Loading
Please register or sign in to comment