Skip to content
Snippets Groups Projects
Commit ef770031 authored by Sameer Agarwal's avatar Sameer Agarwal Committed by Yin Huai
Browse files

[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based...

[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints

## What changes were proposed in this pull request?

This PR adds an optimizer rule to eliminate reading (unnecessary) NULL values if they are not required for correctness by inserting `isNotNull` filters is the query plan. These filters are currently inserted beneath existing `Filter` and `Join` operators and are inferred based on their data constraints.

Note: While this optimization is applicable to all types of join, it primarily benefits `Inner` and `LeftSemi` joins.

## How was this patch tested?

1. Added a new `NullFilteringSuite` that tests for `IsNotNull` filters in the query plan for joins and filters. Also, tests interaction with the `CombineFilters` optimizer rules.
2. Test generated ExpressionTrees via `OrcFilterSuite`
3. Test filter source pushdown logic via `SimpleTextHadoopFsRelationSuite`

cc yhuai nongli

Author: Sameer Agarwal <sameer@databricks.com>

Closes #11372 from sameeragarwal/gen-isnotnull.
parent 48964111
No related branches found
No related tags found
No related merge requests found
Showing with 182 additions and 20 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment