Skip to content
Snippets Groups Projects
Commit 59250fe5 authored by scwf's avatar scwf Committed by Michael Armbrust
Browse files

[SPARK-7303] [SQL] push down project if possible when the child is sort

Optimize the case of `project(_, sort)` , a example is:

`select key from (select * from testData order by key) t`

before this PR:
```
== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
    'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
    Subquery testData
     LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Physical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  Exchange (RangePartitioning [key#0 ASC], 5), []
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
```

after this PR
```
== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
    'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
    Subquery testData
     LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Sort [key#0 ASC], true
 Project [key#0]
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Physical Plan ==
Sort [key#0 ASC], true
 Exchange (RangePartitioning [key#0 ASC], 5), []
  Project [key#0]
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
```

with this rule we will first do column pruning on the table and then do sorting.

Author: scwf <wangfei1@huawei.com>

This patch had conflicts when merged, resolved by
Committer: Michael Armbrust <michael@databricks.com>

Closes #5838 from scwf/pruning and squashes the following commits:

b00d833 [scwf] address michael's comment
e230155 [scwf] fix tests failure
b09b895 [scwf] improve column pruning
parent df2fb130
No related branches found
No related tags found
No related merge requests found
......@@ -156,6 +156,11 @@ object ColumnPruning extends Rule[LogicalPlan] {
case Project(projectList, Limit(exp, child)) =>
Limit(exp, Project(projectList, child))
// push down project if possible when the child is sort
case p @ Project(projectList, s @ Sort(_, _, grandChild))
if s.references.subsetOf(p.outputSet) =>
s.copy(child = Project(projectList, grandChild))
// Eliminate no-op Projects
case Project(projectList, child) if child.output == projectList => child
}
......
......@@ -19,7 +19,7 @@ package org.apache.spark.sql.catalyst.optimizer
import org.apache.spark.sql.catalyst.analysis
import org.apache.spark.sql.catalyst.analysis.EliminateSubQueries
import org.apache.spark.sql.catalyst.expressions.{Count, Explode}
import org.apache.spark.sql.catalyst.expressions.{SortOrder, Ascending, Count, Explode}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.plans.{LeftSemi, PlanTest, LeftOuter, RightOuter}
import org.apache.spark.sql.catalyst.rules._
......@@ -542,4 +542,38 @@ class FilterPushdownSuite extends PlanTest {
comparePlans(optimized, originalQuery)
}
test("push down project past sort") {
val x = testRelation.subquery('x)
// push down valid
val originalQuery = {
x.select('a, 'b)
.sortBy(SortOrder('a, Ascending))
.select('a)
}
val optimized = Optimize.execute(originalQuery.analyze)
val correctAnswer =
x.select('a)
.sortBy(SortOrder('a, Ascending)).analyze
comparePlans(optimized, analysis.EliminateSubQueries(correctAnswer))
// push down invalid
val originalQuery1 = {
x.select('a, 'b)
.sortBy(SortOrder('a, Ascending))
.select('b)
}
val optimized1 = Optimize.execute(originalQuery1.analyze)
val correctAnswer1 =
x.select('a, 'b)
.sortBy(SortOrder('a, Ascending))
.select('b).analyze
comparePlans(optimized1, analysis.EliminateSubQueries(correctAnswer1))
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment