-
- Downloads
[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based...
[SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints ## What changes were proposed in this pull request? This PR adds an optimizer rule to eliminate reading (unnecessary) NULL values if they are not required for correctness by inserting `isNotNull` filters is the query plan. These filters are currently inserted beneath existing `Filter` and `Join` operators and are inferred based on their data constraints. Note: While this optimization is applicable to all types of join, it primarily benefits `Inner` and `LeftSemi` joins. ## How was this patch tested? 1. Added a new `NullFilteringSuite` that tests for `IsNotNull` filters in the query plan for joins and filters. Also, tests interaction with the `CombineFilters` optimizer rules. 2. Test generated ExpressionTrees via `OrcFilterSuite` 3. Test filter source pushdown logic via `SimpleTextHadoopFsRelationSuite` cc yhuai nongli Author: Sameer Agarwal <sameer@databricks.com> Closes #11372 from sameeragarwal/gen-isnotnull.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 49 additions, 0 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala 95 additions, 0 deletions...che/spark/sql/catalyst/optimizer/NullFilteringSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala 15 additions, 3 deletions.../scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala 1 addition, 1 deletion...t/scala/org/apache/spark/sql/execution/PlannerSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala 2 additions, 4 deletions...ql/execution/datasources/parquet/ParquetFilterSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcFilterSuite.scala 9 additions, 7 deletions.../scala/org/apache/spark/sql/hive/orc/OrcFilterSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextHadoopFsRelationSuite.scala 2 additions, 2 deletions...e/spark/sql/sources/SimpleTextHadoopFsRelationSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 9 additions, 3 deletions...ala/org/apache/spark/sql/sources/SimpleTextRelation.scala
Loading
Please register or sign in to comment