Skip to content
Snippets Groups Projects
Commit 360ed832 authored by Yanbo Liang's avatar Yanbo Liang Committed by Michael Armbrust
Browse files

[SPARK-11303][SQL] filter should not be pushed down into sample

When sampling and then filtering DataFrame, the SQL Optimizer will push down filter into sample and produce wrong result. This is due to the sampler is calculated based on the original scope rather than the scope after filtering.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9294 from yanboliang/spark-11303.
parent 958a0ec8
No related branches found
No related tags found
No related merge requests found
......@@ -74,10 +74,6 @@ object DefaultOptimizer extends Optimizer {
object SamplePushDown extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
// Push down filter into sample
case Filter(condition, s @ Sample(lb, up, replace, seed, child)) =>
Sample(lb, up, replace, seed,
Filter(condition, child))
// Push down projection into sample
case Project(projectList, s @ Sample(lb, up, replace, seed, child)) =>
Sample(lb, up, replace, seed,
......
......@@ -1860,4 +1860,14 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
Row(1))
}
}
test("SPARK-11303: filter should not be pushed down into sample") {
val df = sqlContext.range(100)
List(true, false).foreach { withReplacement =>
val sampled = df.sample(withReplacement, 0.1, 1)
val sampledOdd = sampled.filter("id % 2 != 0")
val sampledEven = sampled.filter("id % 2 = 0")
assert(sampled.count() == sampledOdd.count() + sampledEven.count())
}
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment