Skip to content
Snippets Groups Projects
Commit c27ba0d5 authored by Davies Liu's avatar Davies Liu Committed by Davies Liu
Browse files

[SPARK-13582] [SQL] defer dictionary decoding in parquet reader

## What changes were proposed in this pull request?

This PR defer the resolution from a id of dictionary to value until the column is actually accessed (inside getInt/getLong), this is very useful for those columns and rows that are filtered out. It's also useful for binary type, we will not need to copy all the byte arrays.

This PR also change the underlying type for small decimal that could be fit within a Int, in order to use getInt() to lookup the value from IntDictionary.

## How was this patch tested?

Manually test TPCDS Q7 with scale factor 10, saw about 30% improvements (after PR #11274).

Author: Davies Liu <davies@databricks.com>

Closes #11437 from davies/decode_dict.
parent c37bbb3a
No related branches found
No related tags found
No related merge requests found
Showing
with 221 additions and 203 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment