-
- Downloads
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates
## What changes were proposed in this pull request? This patch fixes a minor correctness issue impacting the pushdown of filters beneath aggregates. Specifically, if a filter condition references no grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly pushed beneath an aggregate. Intuitively, the only case where you can push a filter beneath an aggregate is when that filter is deterministic and is defined over the grouping columns / expressions, since in that case the filter is acting to exclude entire groups from the query (like a `HAVING` clause). The existing code would only push deterministic filters beneath aggregates when all of the filter's references were grouping columns, but this logic missed the case where a filter has no references. For example, `WHERE false` is deterministic but is independent of the actual data. This patch fixes this minor bug by adding a new check to ensure that we don't push filters beneath aggregates when those filters don't reference any columns. ## How was this patch tested? New regression test in FilterPushdownSuite. Author: Josh Rosen <joshrosen@databricks.com> Closes #15289 from JoshRosen/SPARK-17712.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 1 addition, 1 deletion...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala 17 additions, 0 deletions...he/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
Loading
Please register or sign in to comment