-
- Downloads
[SPARK-16174][SQL] Improve `OptimizeIn` optimizer to remove literal repetitions
## What changes were proposed in this pull request? This PR improves `OptimizeIn` optimizer to remove the literal repetitions from SQL `IN` predicates. This optimizer prevents user mistakes and also can optimize some queries like [TPCDS-36](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q36.sql#L19). **Before** ```scala scala> sql("select state from (select explode(array('CA','TN')) state) where state in ('TN','TN','TN','TN','TN','TN','TN')").explain == Physical Plan == *Filter state#6 IN (TN,TN,TN,TN,TN,TN,TN) +- Generate explode([CA,TN]), false, false, [state#6] +- Scan OneRowRelation[] ``` **After** ```scala scala> sql("select state from (select explode(array('CA','TN')) state) where state in ('TN','TN','TN','TN','TN','TN','TN')").explain == Physical Plan == *Filter state#6 IN (TN) +- Generate explode([CA,TN]), false, false, [state#6] +- Scan OneRowRelation[] ``` ## How was this patch tested? Pass the Jenkins tests (including a new testcase). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13876 from dongjoon-hyun/SPARK-16174.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala 1 addition, 0 deletions...rg/apache/spark/sql/catalyst/expressions/predicates.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 14 additions, 6 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeInSuite.scala 24 additions, 0 deletions...apache/spark/sql/catalyst/optimizer/OptimizeInSuite.scala
Loading
Please register or sign in to comment