-
- Downloads
[SPARK-13609] [SQL] Support Column Pruning for MapPartitions
#### What changes were proposed in this pull request? This PR is to prune unnecessary columns when the operator is `MapPartitions`. The solution is to add an extra `Project` in the child node. For the other two operators `AppendColumns` and `MapGroups`, it sounds doable. More discussions are required. The major reason is the current implementation of the `inputPlan` of `groupBy` is based on the child of `AppendColumns`. It might be a bug? Thus, will submit a separate PR. #### How was this patch tested? Added a test case in ColumnPruningSuite to verify the rule. Added another test case in DatasetSuite.scala to verify the data. Author: gatorsmile <gatorsmile@gmail.com> Closes #11460 from gatorsmile/datasetPruningNew.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 4 additions, 1 deletion...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala 14 additions, 0 deletions...che/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 10 additions, 1 deletion...re/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
Loading
Please register or sign in to comment