Skip to content
Snippets Groups Projects
Commit 3fd39b87 authored by Sameer Agarwal's avatar Sameer Agarwal Committed by Reynold Xin
Browse files

[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError

## What changes were proposed in this pull request?

We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. As a short term fix, this patch intercepts the OutOfMemoryError exception and suggest the user to disable the vectorized parquet reader.

## How was this patch tested?

Existing Tests

Author: Sameer Agarwal <sameerag@cs.berkeley.edu>

Closes #14387 from sameeragarwal/oom.
parent 1178d61e
No related branches found
No related tags found
No related merge requests found
......@@ -282,16 +282,30 @@ public abstract class ColumnVector implements AutoCloseable {
if (requiredCapacity > capacity) {
int newCapacity = (int) Math.min(MAX_CAPACITY, requiredCapacity * 2L);
if (requiredCapacity <= newCapacity) {
reserveInternal(newCapacity);
try {
reserveInternal(newCapacity);
} catch (OutOfMemoryError outOfMemoryError) {
throwUnsupportedException(newCapacity, requiredCapacity, outOfMemoryError);
}
} else {
throw new RuntimeException("Cannot reserve more than " + newCapacity +
" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a " +
"workaround, you can disable the vectorized reader by setting "
+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.");
throwUnsupportedException(newCapacity, requiredCapacity, null);
}
}
}
private void throwUnsupportedException(int newCapacity, int requiredCapacity, Throwable cause) {
String message = "Cannot reserve more than " + newCapacity +
" bytes in the vectorized reader (requested = " + requiredCapacity + " bytes). As a" +
" workaround, you can disable the vectorized reader by setting "
+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.";
if (cause != null) {
throw new RuntimeException(message, cause);
} else {
throw new RuntimeException(message);
}
}
/**
* Ensures that there is enough storage to store capcity elements. That is, the put() APIs
* must work for all rowIds < capcity.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment