-
- Downloads
[SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT
## What changes were proposed in this pull request? Currently, `SQLBuilder` handles `LIMIT` by always adding `LIMIT` at the end of the generated subSQL. It makes `RuntimeException`s like the following. This PR adds a parenthesis always except `SubqueryAlias` is used together with `LIMIT`. **Before** ``` scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ... ``` **After** ``` scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") scala> sql("SELECT id2 FROM v1") res4: org.apache.spark.sql.DataFrame = [id2: int] ``` **Fixed cases in this PR** The following two cases are the detail query plans having problematic SQL generations. 1. `SELECT * FROM (SELECT id FROM tbl LIMIT 2)` Please note that **FROM SELECT** part of the generated SQL in the below. When we don't use '()' for limit, this fails. ```scala # Original logical plan: Project [id#1] +- GlobalLimit 2 +- LocalLimit 2 +- Project [id#1] +- MetastoreRelation default, tbl # Canonicalized logical plan: Project [gen_attr_0#1 AS id#4] +- SubqueryAlias tbl +- Project [gen_attr_0#1] +- GlobalLimit 2 +- LocalLimit 2 +- Project [gen_attr_0#1] +- SubqueryAlias gen_subquery_0 +- Project [id#1 AS gen_attr_0#1] +- SQLTable default, tbl, [id#1] # Generated SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT `gen_attr_0` FROM (SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2) AS tbl ``` 2. `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))` Please note that **((~~~) AS gen_subquery_0 LIMIT 2)** in the below. When we use '()' for limit on `SubqueryAlias`, this fails. ```scala # Original logical plan: Project [id#1] +- Project [id#1] +- GlobalLimit 2 +- LocalLimit 2 +- MetastoreRelation default, tbl # Canonicalized logical plan: Project [gen_attr_0#1 AS id#4] +- SubqueryAlias tbl +- Project [gen_attr_0#1] +- GlobalLimit 2 +- LocalLimit 2 +- SubqueryAlias gen_subquery_0 +- Project [id#1 AS gen_attr_0#1] +- SQLTable default, tbl, [id#1] # Generated SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM ((SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2)) AS tbl ``` ## How was this patch tested? Pass the Jenkins test with a newly added test case. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15546 from dongjoon-hyun/SPARK-17982.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala 6 additions, 1 deletion...main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala
- sql/hive/src/test/resources/sqlgen/generate_with_other_1.sql 1 addition, 1 deletionsql/hive/src/test/resources/sqlgen/generate_with_other_1.sql
- sql/hive/src/test/resources/sqlgen/generate_with_other_2.sql 1 addition, 1 deletionsql/hive/src/test/resources/sqlgen/generate_with_other_2.sql
- sql/hive/src/test/resources/sqlgen/limit.sql 4 additions, 0 deletionssql/hive/src/test/resources/sqlgen/limit.sql
- sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala 10 additions, 0 deletions...org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala
Loading
Please register or sign in to comment