-
- Downloads
[SPARK-7567] [SQL] Migrating Parquet data source to FSBasedRelation
This PR migrates Parquet data source to the newly introduced `FSBasedRelation`. `FSBasedParquetRelation` is created to replace `ParquetRelation2`. Major differences are: 1. Partition discovery code has been factored out to `FSBasedRelation` 1. `AppendingParquetOutputFormat` is not used now. Instead, an anonymous subclass of `ParquetOutputFormat` is used to handle appending and writing dynamic partitions 1. When scanning partitioned tables, `FSBasedParquetRelation.buildScan` only builds an `RDD[Row]` for a single selected partition 1. `FSBasedParquetRelation` doesn't rely on Catalyst expressions for filter push down, thus it doesn't extend `CatalystScan` anymore After migrating `JSONRelation` (which extends `CatalystScan`), we can remove `CatalystScan`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6090) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #6090 from liancheng/parquet-migration and squashes the following commits: 6063f87 [Cheng Lian] Casts to OutputCommitter rather than FileOutputCommtter bfd1cf0 [Cheng Lian] Fixes compilation error introduced while rebasing f9ea56e [Cheng Lian] Adds ParquetRelation2 related classes to MiMa check whitelist 261d8c1 [Cheng Lian] Minor bug fix and more tests db65660 [Cheng Lian] Migrates Parquet data source to FSBasedRelation
Showing
- project/MimaExcludes.scala 6 additions, 0 deletionsproject/MimaExcludes.scala
- sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 7 additions, 1 deletion...core/src/main/scala/org/apache/spark/sql/SQLContext.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala 167 additions, 111 deletions...n/scala/org/apache/spark/sql/parquet/ParquetFilters.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala 1 addition, 1 deletion...org/apache/spark/sql/parquet/ParquetTableOperations.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/fsBasedParquet.scala 565 additions, 0 deletions...n/scala/org/apache/spark/sql/parquet/fsBasedParquet.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala 16 additions, 3 deletions...rc/main/scala/org/apache/spark/sql/sources/commands.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetFilterSuite.scala 3 additions, 3 deletions...ala/org/apache/spark/sql/parquet/ParquetFilterSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala 3 additions, 3 deletions...t/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetPartitionDiscoverySuite.scala 3 additions, 7 deletions...he/spark/sql/parquet/ParquetPartitionDiscoverySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetSchemaSuite.scala 6 additions, 6 deletions...ala/org/apache/spark/sql/parquet/ParquetSchemaSuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 14 additions, 11 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala 6 additions, 9 deletions...org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 7 additions, 9 deletions...a/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala 15 additions, 20 deletions.../test/scala/org/apache/spark/sql/hive/parquetSuites.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/fsBasedRelationSuites.scala 107 additions, 66 deletions.../org/apache/spark/sql/sources/fsBasedRelationSuites.scala
Loading
Please register or sign in to comment