-
- Downloads
[SPARK-16928] [SQL] Recursive call of ColumnVector::getInt() breaks JIT inlining
## What changes were proposed in this pull request? In both `OnHeapColumnVector` and `OffHeapColumnVector`, we implemented `getInt()` with the following code pattern: ``` public int getInt(int rowId) { if (dictionary == null) { return intData[rowId]; } else { return dictionary.decodeToInt(dictionaryIds.getInt(rowId)); } } ``` As `dictionaryIds` is also a `ColumnVector`, this results in a recursive call of `getInt()` and breaks JIT inlining. As a result, `getInt()` will not get inlined. We fix this by adding a separate method `getDictId()` specific for `dictionaryIds` to use. ## How was this patch tested? We tested the difference with the following aggregate query on a TPCDS dataset (with scale factor = 5): ``` select max(ss_sold_date_sk) as max_ss_sold_date_sk, from store_sales ``` The query runtime is improved, from 202ms (before) to 159ms (after). Author: Qifan Pu <qifan.pu@gmail.com> Closes #14513 from ooq/SPARK-16928.
Showing
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java 11 additions, 11 deletions...execution/datasources/parquet/VectorizedColumnReader.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java 9 additions, 2 deletions...g/apache/spark/sql/execution/vectorized/ColumnVector.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java 17 additions, 6 deletions...e/spark/sql/execution/vectorized/OffHeapColumnVector.java
- sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java 17 additions, 6 deletions...he/spark/sql/execution/vectorized/OnHeapColumnVector.java
Loading
Please register or sign in to comment