Skip to content
Snippets Groups Projects
Commit 9039333c authored by Nong Li's avatar Nong Li Committed by Reynold Xin
Browse files

[SPARK-12644][SQL] Update parquet reader to be vectorized.

This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch.
There are a few particulars in the Parquet encodings that make this much more efficient. In
particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are
also very suited for this.

This is a work in progress and does not affect the current execution. In subsequent patches, we will
support more encodings and types before enabling this.

Simple benchmarks indicate this can decode single ints about > 3x faster.

Author: Nong Li <nong@databricks.com>
Author: Nong <nongli@gmail.com>

Closes #10593 from nongli/spark-12644.
parent 3b5ccb12
No related branches found
No related tags found
No related merge requests found
Showing
with 625 additions and 56 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment