-
- Downloads
[SPARK-3968][SQL] Use parquet-mr filter2 api
The parquet-mr project has introduced a new filter api (https://github.com/apache/incubator-parquet-mr/pull/4), along with several fixes . It can also eliminate entire RowGroups depending on certain statistics like min/max We can leverage that to further improve performance of queries with filters. Also filter2 api introduces ability to create custom filters. We can create a custom filter for the optimized In clause (InSet) , so that elimination happens in the ParquetRecordReader itself Author: Yash Datta <Yash.Datta@guavus.com> Closes #2841 from saucam/master and squashes the following commits: 8282ba0 [Yash Datta] SPARK-3968: fix scala code style and add some more tests for filtering on optional columns 515df1c [Yash Datta] SPARK-3968: Add a test case for filter pushdown on optional column 5f4530e [Yash Datta] SPARK-3968: Fix scala code style f304667 [Yash Datta] SPARK-3968: Using task metadata strategy for row group filtering ec53e92 [Yash Datta] SPARK-3968: No push down should result in case we are unable to create a record filter 48163c3 [Yash Datta] SPARK-3968: Code cleanup cc7b596 [Yash Datta] SPARK-3968: 1. Fix RowGroupFiltering not working 2. Use the serialization/deserialization from Parquet library for filter pushdown caed851 [Yash Datta] Revert "SPARK-3968: Not pushing the filters in case of OPTIONAL columns" since filtering on optional columns is now supported in filter2 api 49703c9 [Yash Datta] SPARK-3968: Not pushing the filters in case of OPTIONAL columns 9d09741 [Yash Datta] SPARK-3968: Change parquet filter pushdown to use filter2 api of parquet-mr
Showing
- pom.xml 1 addition, 1 deletionpom.xml
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala 86 additions, 144 deletions...n/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala 145 additions, 34 deletions...org/apache/spark/sql/parquet/ParquetTableOperations.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala 19 additions, 0 deletions.../scala/org/apache/spark/sql/parquet/ParquetTestData.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala 57 additions, 0 deletions...cala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala
Loading
Please register or sign in to comment