-
- Downloads
[SPARK-21781][SQL] Modify DataSourceScanExec to use concrete ColumnVector type.
## What changes were proposed in this pull request? As mentioned at https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we have more `ColumnVector` implementations, it might (or might not) have huge performance implications because it might disable inlining, or force virtual dispatches. As for read path, one of the major paths is the one generated by `ColumnBatchScan`. Currently it refers `ColumnVector` so the penalty will be bigger as we have more classes, but we can know the concrete type from its usage, e.g. vectorized Parquet reader uses `OnHeapColumnVector`. We can use the concrete type in the generated code directly to avoid the penalty. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <ueshin@databricks.com> Closes #18989 from ueshin/issues/SPARK-21781.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala 9 additions, 5 deletions...la/org/apache/spark/sql/execution/ColumnarBatchScan.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 5 additions, 0 deletions...a/org/apache/spark/sql/execution/DataSourceScanExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala 10 additions, 0 deletions...g/apache/spark/sql/execution/datasources/FileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala 8 additions, 0 deletions...sql/execution/datasources/parquet/ParquetFileFormat.scala
Loading
Please register or sign in to comment