Skip to content
Snippets Groups Projects
Commit 5989c85b authored by Nong Li's avatar Nong Li Committed by Davies Liu
Browse files

[SPARK-14217] [SQL] Fix bug if parquet data has columns that use dictionary...

[SPARK-14217] [SQL] Fix bug if parquet data has columns that use dictionary encoding for some of the data

## What changes were proposed in this pull request?

This PR is based on #12017

Currently, this causes batches where some values are dictionary encoded and some
which are not. The non-dictionary encoded values cause us to remove the dictionary
from the batch causing the first values to return garbage.

This patch fixes the issue by first decoding the dictionary for the values that are
already dictionary encoded before switching. A similar thing is done for the reverse
case where the initial values are not dictionary encoded.

## How was this patch tested?

This is difficult to test but replicated on a test cluster using a large tpcds data set.

Author: Nong Li <nong@databricks.com>
Author: Davies Liu <davies@databricks.com>

Closes #12279 from davies/fix_dict.
parent 5cb5edaf
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment