-
- Downloads
[SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner.
Some parts of the engine rely on UnsafeRow which the vectorized parquet scanner does not want to produce. This add a conversion in Physical RDD. In the case where codegen is used (and the scan is the start of the pipeline), there is no requirement to use UnsafeRow. This patch adds update PhysicallRDD to support codegen, which eliminates the need for the UnsafeRow conversion in all cases. The result of these changes for TPCDS-Q19 at the 10gb sf reduces the query time from 9.5 seconds to 6.5 seconds. Author: Nong Li <nong@databricks.com> Closes #11141 from nongli/spark-13250.
Showing
- python/pyspark/sql/dataframe.py 2 additions, 1 deletionpython/pyspark/sql/dataframe.py
- sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala 41 additions, 4 deletions...in/scala/org/apache/spark/sql/execution/ExistingRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 1 addition, 0 deletions...la/org/apache/spark/sql/execution/WholeStageCodegen.scala
- sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 4 additions, 2 deletions.../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/FilteredScanSuite.scala 31 additions, 23 deletions...cala/org/apache/spark/sql/sources/FilteredScanSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala 26 additions, 19 deletions.../scala/org/apache/spark/sql/sources/PrunedScanSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala 24 additions, 22 deletions...cala/org/apache/spark/sql/sources/BucketedReadSuite.scala
Loading
Please register or sign in to comment