-
- Downloads
[SPARK-3212][SQL] Use logical plan matching instead of temporary tables for table caching
_Also addresses: SPARK-1671, SPARK-1379 and SPARK-3641_ This PR introduces a new trait, `CacheManger`, which replaces the previous temporary table based caching system. Instead of creating a temporary table that shadows an existing table with and equivalent cached representation, the cached manager maintains a separate list of logical plans and their cached data. After optimization, this list is searched for any matching plan fragments. When a matching plan fragment is found it is replaced with the cached data. There are several advantages to this approach: - Calling .cache() on a SchemaRDD now works as you would expect, and uses the more efficient columnar representation. - Its now possible to provide a list of temporary tables, without having to decide if a given table is actually just a cached persistent table. (To be done in a follow-up PR) - In some cases it is possible that cached data will be used, even if a cached table was not explicitly requested. This is because we now look at the logical structure instead of the table name. - We now correctly invalidate when data is inserted into a hive table. Author: Michael Armbrust <michael@databricks.com> Closes #2501 from marmbrus/caching and squashes the following commits: 63fbc2c [Michael Armbrust] Merge remote-tracking branch 'origin/master' into caching. 0ea889e [Michael Armbrust] Address comments. 1e23287 [Michael Armbrust] Add support for cache invalidation for hive inserts. 65ed04a [Michael Armbrust] fix tests. bdf9a3f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into caching b4b77f2 [Michael Armbrust] Address comments 6923c9d [Michael Armbrust] More comments / tests 80f26ac [Michael Armbrust] First draft of improved semantics for Spark SQL caching.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 3 additions, 0 deletions...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala 2 additions, 2 deletions...che/spark/sql/catalyst/expressions/namedExpressions.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala 42 additions, 0 deletions...apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/TestRelation.scala 6 additions, 0 deletions...pache/spark/sql/catalyst/plans/logical/TestRelation.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala 2 additions, 2 deletions...che/spark/sql/catalyst/plans/logical/basicOperators.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SameResultSuite.scala 62 additions, 0 deletions...org/apache/spark/sql/catalyst/plans/SameResultSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/CacheManager.scala 139 additions, 0 deletions...re/src/main/scala/org/apache/spark/sql/CacheManager.scala
- sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 8 additions, 43 deletions...core/src/main/scala/org/apache/spark/sql/SQLContext.scala
- sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala 20 additions, 3 deletionssql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala 2 additions, 3 deletions...e/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala
- sql/core/src/main/scala/org/apache/spark/sql/api/java/JavaSQLContext.scala 5 additions, 5 deletions.../scala/org/apache/spark/sql/api/java/JavaSQLContext.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala 25 additions, 3 deletions...apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala 119 additions, 0 deletions...in/scala/org/apache/spark/sql/execution/ExistingRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala 0 additions, 33 deletions...main/scala/org/apache/spark/sql/execution/SparkPlan.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 5 additions, 4 deletions...cala/org/apache/spark/sql/execution/SparkStrategies.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala 0 additions, 39 deletions...scala/org/apache/spark/sql/execution/basicOperators.scala
- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala 55 additions, 48 deletions...rc/test/scala/org/apache/spark/sql/CachedTableSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala 4 additions, 3 deletions...pache/spark/sql/columnar/InMemoryColumnarQuerySuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 1 addition, 6 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala 1 addition, 5 deletions...main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
Loading
Please register or sign in to comment