Skip to content
Snippets Groups Projects
Commit 3814d0bc authored by Liang-Chi Hsieh's avatar Liang-Chi Hsieh Committed by Reynold Xin
Browse files

[SPARK-13530][SQL] Add ShortType support to UnsafeRowParquetRecordReader

JIRA: https://issues.apache.org/jira/browse/SPARK-13530

## What changes were proposed in this pull request?

By enabling vectorized parquet scanner by default, the unit test `ParquetHadoopFsRelationSuite` based on `HadoopFsRelationTest` will be failed due to the lack of short type support in `UnsafeRowParquetRecordReader`. We should fix it.

The error exception:

    [info] ParquetHadoopFsRelationSuite:
    [info] - test all data types - StringType (499 milliseconds)
    [info] - test all data types - BinaryType (447 milliseconds)
    [info] - test all data types - BooleanType (520 milliseconds)
    [info] - test all data types - ByteType (418 milliseconds)
    00:22:58.920 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 124.0 (TID 1949)
    org.apache.commons.lang.NotImplementedException: Unimplemented type: ShortType
	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.readIntBatch(UnsafeRowParquetRecordReader.java:769)
	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.readBatch(UnsafeRowParquetRecordReader.java:640)
	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.access$000(UnsafeRowParquetRecordReader.java:461)
	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.nextBatch(UnsafeRowParquetRecordReader.java:224)
## How was this patch tested?

The unit test `ParquetHadoopFsRelationSuite` based on `HadoopFsRelationTest` will be [failed](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52110/consoleFull) due to the lack of short type support in UnsafeRowParquetRecordReader. By adding this support, the test can be passed.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #11412 from viirya/add-shorttype-support.
parent ec0cc75e
No related branches found
No related tags found
No related merge requests found
......@@ -765,6 +765,9 @@ public class UnsafeRowParquetRecordReader extends SpecificParquetRecordReaderBas
} else if (DecimalType.is64BitDecimalType(column.dataType())) {
defColumn.readIntsAsLongs(
num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
} else if (column.dataType() == DataTypes.ShortType) {
defColumn.readShorts(
num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
} else {
throw new NotImplementedException("Unimplemented type: " + column.dataType());
}
......
......@@ -301,6 +301,38 @@ public final class VectorizedRleValuesReader extends ValuesReader
}
}
public void readShorts(int total, ColumnVector c,
int rowId, int level, VectorizedValuesReader data) {
int left = total;
while (left > 0) {
if (this.currentCount == 0) this.readNextGroup();
int n = Math.min(left, this.currentCount);
switch (mode) {
case RLE:
if (currentValue == level) {
for (int i = 0; i < n; i++) {
c.putShort(rowId + i, (short)data.readInteger());
}
} else {
c.putNulls(rowId, n);
}
break;
case PACKED:
for (int i = 0; i < n; ++i) {
if (currentBuffer[currentBufferIdx++] == level) {
c.putShort(rowId + i, (short)data.readInteger());
} else {
c.putNull(rowId + i);
}
}
break;
}
rowId += n;
left -= n;
currentCount -= n;
}
}
public void readLongs(int total, ColumnVector c, int rowId, int level,
VectorizedValuesReader data) {
int left = total;
......@@ -611,4 +643,4 @@ public final class VectorizedRleValuesReader extends ValuesReader
throw new ParquetDecodingException("not a valid mode " + this.mode);
}
}
}
\ No newline at end of file
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment