Skip to content
Snippets Groups Projects
Commit 72999616 authored by Sameer Agarwal's avatar Sameer Agarwal Committed by Yin Huai
Browse files

[SPARK-14016][SQL] Support high-precision decimals in vectorized parquet reader

## What changes were proposed in this pull request?

This patch adds support for reading `DecimalTypes` with high (> 18) precision in `VectorizedColumnReader`

## How was this patch tested?

1. `VectorizedColumnReader` initially had a gating condition on `primitiveType.getDecimalMetadata().getPrecision() > Decimal.MAX_LONG_DIGITS()` that made us fall back on parquet-mr for handling high-precision decimals. This condition is now removed.
2. In particular, the `ParquetHadoopFsRelationSuite` (that tests for all supported hive types -- including `DecimalType(25, 5)`) fails when the gating condition is removed (https://github.com/apache/spark/pull/11808) and should now pass with this change.

Author: Sameer Agarwal <sameer@databricks.com>

Closes #11869 from sameeragarwal/bigdecimal-parquet.
parent 43ef1e52
No related branches found
No related tags found
No related merge requests found
......@@ -262,6 +262,11 @@ public class VectorizedColumnReader {
Binary v = dictionary.decodeToBinary(dictionaryIds.getInt(i));
column.putLong(i, CatalystRowConverter.binaryToUnscaledLong(v));
}
} else if (DecimalType.isByteArrayDecimalType(column.dataType())) {
for (int i = rowId; i < rowId + num; ++i) {
Binary v = dictionary.decodeToBinary(dictionaryIds.getInt(i));
column.putByteArray(i, v.getBytes());
}
} else {
throw new NotImplementedException();
}
......@@ -368,6 +373,14 @@ public class VectorizedColumnReader {
column.putNull(rowId + i);
}
}
} else if (DecimalType.isByteArrayDecimalType(column.dataType())) {
for (int i = 0; i < num; i++) {
if (defColumn.readInteger() == maxDefLevel) {
column.putByteArray(rowId + i, data.readBinary(arrayLen).getBytes());
} else {
column.putNull(rowId + i);
}
}
} else {
throw new NotImplementedException("Unimplemented type: " + column.dataType());
}
......
......@@ -220,10 +220,6 @@ public class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBa
originalTypes[i] != OriginalType.INT_8 && originalTypes[i] != OriginalType.INT_16) {
throw new IOException("Unsupported type: " + t);
}
if (originalTypes[i] == OriginalType.DECIMAL &&
primitiveType.getDecimalMetadata().getPrecision() > Decimal.MAX_LONG_DIGITS()) {
throw new IOException("Decimal with high precision is not supported.");
}
if (primitiveType.getPrimitiveTypeName() == PrimitiveType.PrimitiveTypeName.INT96) {
throw new IOException("Int96 not supported.");
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment