Skip to content
Snippets Groups Projects
Commit f7a51dee authored by xin Wu's avatar xin Wu Committed by Yin Huai
Browse files

[SPARK-11246] [SQL] Table cache for Parquet broken in 1.5

The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation.
The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key  to the cached InMemoryRelation.

Author: xin Wu <xinwu@us.ibm.com>

Closes #9326 from xwu0226/spark-11246-commit.
parent 3bb2a8d7
No related branches found
No related tags found
No related merge requests found
......@@ -62,6 +62,11 @@ case class LogicalRelation(
case _ => false
}
// When comparing two LogicalRelations from within LogicalPlan.sameResult, we only need
// LogicalRelation.cleanArgs to return Seq(relation), since expectedOutputAttribute's
// expId can be different but the relation is still the same.
override lazy val cleanArgs: Seq[Any] = Seq(relation)
@transient override lazy val statistics: Statistics = Statistics(
sizeInBytes = BigInt(relation.sizeInBytes)
)
......
......@@ -20,6 +20,7 @@ package org.apache.spark.sql.hive
import java.io.File
import org.apache.spark.sql.columnar.InMemoryColumnarTableScan
import org.apache.spark.sql.execution.datasources.parquet.ParquetRelation
import org.apache.spark.sql.hive.test.TestHiveSingleton
import org.apache.spark.sql.{AnalysisException, QueryTest, SaveMode}
import org.apache.spark.storage.RDDBlockId
......@@ -203,4 +204,14 @@ class CachedTableSuite extends QueryTest with TestHiveSingleton {
sql("DROP TABLE refreshTable")
Utils.deleteRecursively(tempPath)
}
test("SPARK-11246 cache parquet table") {
sql("CREATE TABLE cachedTable STORED AS PARQUET AS SELECT 1")
cacheTable("cachedTable")
val sparkPlan = sql("SELECT * FROM cachedTable").queryExecution.sparkPlan
assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 1)
sql("DROP TABLE cachedTable")
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment