-
- Downloads
[SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq predicates with nulls for Parquet
Predicates like `a = NULL` and `a < NULL` can't be pushed down since Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Note that `Eq` and `NotEq` can only be used with `null` to represent predicates like `a IS NULL` and `a IS NOT NULL`. However, normally this issue doesn't cause NPE because any value compared to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` predicate in the `SimplifyFilters` rule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do **NOT** think we need to backport this to branch-1.1 This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, added support for Parquet `NotEq` filter for completeness and (tiny) performance gain, it's also used to pushdown `IsNotNull`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3367) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3367 from liancheng/filters-with-null and squashes the following commits: cc41281 [Cheng Lian] Fixes several styling issues de7de28 [Cheng Lian] Adds stricter rules for Parquet filters with null
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala 9 additions, 0 deletions.../org/apache/spark/sql/catalyst/expressions/literals.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala 54 additions, 14 deletions...n/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala 120 additions, 9 deletions...cala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala
Loading
Please register or sign in to comment