Skip to content
Snippets Groups Projects
Commit a2c9acb0 authored by Sameer Agarwal's avatar Sameer Agarwal Committed by Davies Liu
Browse files

[SPARK-16334] Reusing same dictionary column for decoding consecutive row...

[SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error

## What changes were proposed in this pull request?

This patch fixes a bug in the vectorized parquet reader that's caused by re-using the same dictionary column vector while reading consecutive row groups. Specifically, this issue manifests for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure.

## How was this patch tested?

Manually tested on datasets provided by the community. Thanks to Chris Perluss and Keith Kraus for their invaluable help in tracking down this issue!

Author: Sameer Agarwal <sameerag@cs.berkeley.edu>

Closes #14941 from sameeragarwal/parquet-exception-2.
parent ed9c884d
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment