Skip to content
Snippets Groups Projects
Commit 06c155c9 authored by Xiao Li's avatar Xiao Li
Browse files

[SPARK-20908][SQL] Cache Manager: Hint should be ignored in plan matching

### What changes were proposed in this pull request?

In Cache manager, the plan matching should ignore Hint.
```Scala
      val df1 = spark.range(10).join(broadcast(spark.range(10)))
      df1.cache()
      spark.range(10).join(spark.range(10)).explain()
```
The output plan of the above query shows that the second query is  not using the cached data of the first query.
```
BroadcastNestedLoopJoin BuildRight, Inner
:- *Range (0, 10, step=1, splits=2)
+- BroadcastExchange IdentityBroadcastMode
   +- *Range (0, 10, step=1, splits=2)
```

After the fix, the plan becomes
```
InMemoryTableScan [id#20L, id#23L]
   +- InMemoryRelation [id#20L, id#23L], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
         +- BroadcastNestedLoopJoin BuildRight, Inner
            :- *Range (0, 10, step=1, splits=2)
            +- BroadcastExchange IdentityBroadcastMode
               +- *Range (0, 10, step=1, splits=2)
```

### How was this patch tested?
Added a test.

Author: Xiao Li <gatorsmile@gmail.com>

Closes #18131 from gatorsmile/HintCache.
parent 3969a807
No related branches found
No related tags found
No related merge requests found
......@@ -40,6 +40,8 @@ case class ResolvedHint(child: LogicalPlan, hints: HintInfo = HintInfo())
override def output: Seq[Attribute] = child.output
override lazy val canonicalized: LogicalPlan = child.canonicalized
override def computeStats(conf: SQLConf): Statistics = {
val stats = child.stats(conf)
stats.copy(hints = hints)
......
......@@ -20,7 +20,7 @@ package org.apache.spark.sql.catalyst.plans
import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan, Union}
import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan, ResolvedHint, Union}
import org.apache.spark.sql.catalyst.util._
/**
......@@ -66,4 +66,10 @@ class SameResultSuite extends SparkFunSuite {
assertSameResult(Union(Seq(testRelation, testRelation2)),
Union(Seq(testRelation2, testRelation)))
}
test("hint") {
val df1 = testRelation.join(ResolvedHint(testRelation))
val df2 = testRelation.join(testRelation)
assertSameResult(df1, df2)
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment