-
- Downloads
[SPARK-14482][SQL] Change default Parquet codec from gzip to snappy
## What changes were proposed in this pull request? Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core. This patch changes the default compression codec for Parquet output from gzip to snappy, and also introduces a ParquetOptions class to be more consistent with other data sources (e.g. CSV, JSON). ## How was this patch tested? Should be covered by existing unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12256 from rxin/SPARK-14482.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala 1 addition, 2 deletions...ache/spark/sql/execution/datasources/csv/CSVOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala 59 additions, 0 deletions...rk/sql/execution/datasources/parquet/ParquetOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala 4 additions, 30 deletions...k/sql/execution/datasources/parquet/ParquetRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 1 addition, 1 deletion...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Loading
Please register or sign in to comment