-
- Downloads
[SPARK-12644][SQL] Update parquet reader to be vectorized.
This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch. There are a few particulars in the Parquet encodings that make this much more efficient. In particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are also very suited for this. This is a work in progress and does not affect the current execution. In subsequent patches, we will support more encodings and types before enabling this. Simple benchmarks indicate this can decode single ints about > 3x faster. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10593 from nongli/spark-12644.
Showing
- core/src/main/scala/org/apache/spark/util/Benchmark.scala 3 additions, 3 deletionscore/src/main/scala/org/apache/spark/util/Benchmark.scala
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java 134 additions, 12 deletions...ion/datasources/parquet/UnsafeRowParquetRecordReader.java
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java 66 additions, 0 deletions...tion/datasources/parquet/VectorizedPlainValuesReader.java
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java 274 additions, 0 deletions...cution/datasources/parquet/VectorizedRleValuesReader.java
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedValuesReader.java 37 additions, 0 deletions...execution/datasources/parquet/VectorizedValuesReader.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java 5 additions, 4 deletions...g/apache/spark/sql/execution/vectorized/ColumnVector.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java 7 additions, 6 deletions.../apache/spark/sql/execution/vectorized/ColumnarBatch.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java 2 additions, 0 deletions...e/spark/sql/execution/vectorized/OffHeapColumnVector.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java 1 addition, 0 deletions...he/spark/sql/execution/vectorized/OnHeapColumnVector.java
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadBenchmark.scala 78 additions, 15 deletions.../execution/datasources/parquet/ParquetReadBenchmark.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala 4 additions, 3 deletions...ark/sql/execution/vectorized/ColumnarBatchBenchmark.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala 14 additions, 13 deletions...e/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
Loading
Please register or sign in to comment