-
- Downloads
[SPARK-18814][SQL] CheckAnalysis rejects TPCDS query 32
## What changes were proposed in this pull request? Move the checking of GROUP BY column in correlated scalar subquery from CheckAnalysis to Analysis to fix a regression caused by SPARK-18504. This problem can be reproduced with a simple script now. Seq((1,1)).toDF("pk","pv").createOrReplaceTempView("p") Seq((1,1)).toDF("ck","cv").createOrReplaceTempView("c") sql("select * from p,c where p.pk=c.ck and c.cv = (select avg(c1.cv) from c c1 where c1.ck = p.pk)").show The requirements are: 1. We need to reference the same table twice in both the parent and the subquery. Here is the table c. 2. We need to have a correlated predicate but to a different table. Here is from c (as c1) in the subquery to p in the parent. 3. We will then "deduplicate" c1.ck in the subquery to `ck#<n1>#<n2>` at `Project` above `Aggregate` of `avg`. Then when we compare `ck#<n1>#<n2>` and the original group by column `ck#<n1>` by their canonicalized form, which is #<n2> != #<n1>. That's how we trigger the exception added in SPARK-18504. ## How was this patch tested? SubquerySuite and a simplified version of TPCDS-Q32 Author: Nattavut Sutyanyong <nsy.can@gmail.com> Closes #16246 from nsyca/18814.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala 23 additions, 8 deletions...rg/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
- sql/core/src/test/resources/sql-tests/inputs/scalar-subquery.sql 20 additions, 0 deletions...e/src/test/resources/sql-tests/inputs/scalar-subquery.sql
- sql/core/src/test/resources/sql-tests/results/scalar-subquery.sql.out 46 additions, 0 deletions.../test/resources/sql-tests/results/scalar-subquery.sql.out
- sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 1 addition, 1 deletion...e/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
Please register or sign in to comment