-
- Downloads
[SPARK-13376] [SPARK-13476] [SQL] improve column pruning
## What changes were proposed in this pull request? This PR mostly rewrite the ColumnPruning rule to support most of the SQL logical plans (except those for Dataset). This PR also fix a bug in Generate, it should always output UnsafeRow, added an regression test for that. ## How was this patch tested? This is test by unit tests, also manually test with TPCDS Q78, which could prune all unused columns successfully, improved the performance by 78% (from 22s to 12s). Author: Davies Liu <davies@databricks.com> Closes #11354 from davies/fix_column_pruning.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 58 additions, 70 deletions...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala 126 additions, 2 deletions...che/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala 0 additions, 80 deletions...he/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala 1 addition, 1 deletion.../spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala 19 additions, 9 deletions.../main/scala/org/apache/spark/sql/execution/Generate.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarTableScan.scala 3 additions, 4 deletions...rk/sql/execution/columnar/InMemoryColumnarTableScan.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 8 additions, 0 deletions.../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
Loading
Please register or sign in to comment