-
- Downloads
[SPARK-14962][SQL] Do not push down isnotnull/isnull on unsuportted types in ORC
## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14962 ORC filters were being pushed down for all types for both `IsNull` and `IsNotNull`. This is apparently OK because both `IsNull` and `IsNotNull` do not take a type as an argument (Hive 1.2.x) during building filters (`SearchArgument`) in Spark-side but they do not filter correctly because stored statistics always produces `null` for not supported types (eg `ArrayType`) in ORC-side. So, it is always `true` for `IsNull` which ends up with always `false` for `IsNotNull`. (Please see [RecordReaderImpl.java#L296-L318](https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java#L296-L318) and [RecordReaderImpl.java#L359-L365](https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java#L359-L365) in Hive 1.2) This looks prevented in Hive 1.3.x >= by forcing to give a type ([`PredicateLeaf.Type`](https://github.com/apache/hive/blob/e085b7e9bd059d91aaf013df0db4d71dca90ec6f/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java#L50-L56)) when building a filter ([`SearchArgument`](https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java#L260)) but Hive 1.2.x seems not doing this. This PR prevents ORC filter creation for `IsNull` and `IsNotNull` on unsupported types. `OrcFilters` resembles `ParquetFilters`. ## How was this patch tested? Unittests in `OrcQuerySuite` and `OrcFilterSuite` and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Author: Hyukjin Kwon <gurwls223@gmail.com> Closes #12777 from HyukjinKwon/SPARK-14962.
Showing
- sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala 1 addition, 1 deletion...c/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala 34 additions, 29 deletions...main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala 10 additions, 9 deletions...ain/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcFilterSuite.scala 60 additions, 15 deletions.../scala/org/apache/spark/sql/hive/orc/OrcFilterSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala 14 additions, 0 deletions...t/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala 7 additions, 2 deletions.../scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala
Please register or sign in to comment