Skip to content
Snippets Groups Projects
  • Nong Li's avatar
    5a7af9e7
    [SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner. · 5a7af9e7
    Nong Li authored
    Some parts of the engine rely on UnsafeRow which the vectorized parquet scanner does not want
    to produce. This add a conversion in Physical RDD. In the case where codegen is used (and the
    scan is the start of the pipeline), there is no requirement to use UnsafeRow. This patch adds
    update PhysicallRDD to support codegen, which eliminates the need for the UnsafeRow conversion
    in all cases.
    
    The result of these changes for TPCDS-Q19 at the 10gb sf reduces the query time from 9.5 seconds
    to 6.5 seconds.
    
    Author: Nong Li <nong@databricks.com>
    
    Closes #11141 from nongli/spark-13250.
    5a7af9e7
    History
    [SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner.
    Nong Li authored
    Some parts of the engine rely on UnsafeRow which the vectorized parquet scanner does not want
    to produce. This add a conversion in Physical RDD. In the case where codegen is used (and the
    scan is the start of the pipeline), there is no requirement to use UnsafeRow. This patch adds
    update PhysicallRDD to support codegen, which eliminates the need for the UnsafeRow conversion
    in all cases.
    
    The result of these changes for TPCDS-Q19 at the 10gb sf reduces the query time from 9.5 seconds
    to 6.5 seconds.
    
    Author: Nong Li <nong@databricks.com>
    
    Closes #11141 from nongli/spark-13250.
dataframe.py 51.89 KiB