-
- Downloads
[SPARK-17994][SQL] Add back a file status cache for catalog tables
## What changes were proposed in this pull request? In SPARK-16980, we removed the full in-memory cache of table partitions in favor of loading only needed partitions from the metastore. This greatly improves the initial latency of queries that only read a small fraction of table partitions. However, since the metastore does not store file statistics, we need to discover those from remote storage. With the loss of the in-memory file status cache this has to happen on each query, increasing the latency of repeated queries over the same partitions. The proposal is to add back a per-table cache of partition contents, i.e. Map[Path, Array[FileStatus]]. This cache would be retained per-table, and can be invalidated through refreshTable() and refreshByPath(). Unlike the prior cache, it can be incrementally updated as new partitions are read. ## How was this patch tested? Existing tests and new tests in `HiveTablePerfStatsSuite`. cc mallman Author: Eric Liang <ekl@databricks.com> Author: Michael Allman <michael@videoamp.com> Author: Eric Liang <ekhliang@gmail.com> Closes #15539 from ericl/meta-cache.
Showing
- core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala 7 additions, 0 deletions...scala/org/apache/spark/metrics/source/StaticSources.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala 149 additions, 0 deletions...che/spark/sql/execution/datasources/FileStatusCache.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala 10 additions, 3 deletions.../spark/sql/execution/datasources/ListingFileCatalog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileCatalog.scala 68 additions, 47 deletions.../execution/datasources/PartitioningAwareFileCatalog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala 10 additions, 26 deletions...he/spark/sql/execution/datasources/TableFileCatalog.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 14 additions, 2 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 1 addition, 1 deletion...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala 15 additions, 1 deletion...scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveTablePerfStatsSuite.scala 111 additions, 16 deletions...a/org/apache/spark/sql/hive/HiveTablePerfStatsSuite.scala
Loading
Please register or sign in to comment