-
- Downloads
[SPARK-19120] Refresh Metadata Cache After Loading Hive Tables
### What changes were proposed in this pull request? ```Scala sql("CREATE TABLE tab (a STRING) STORED AS PARQUET") // This table fetch is to fill the cache with zero leaf files spark.table("tab").show() sql( s""" |LOAD DATA LOCAL INPATH '$newPartitionDir' OVERWRITE |INTO TABLE tab """.stripMargin) spark.table("tab").show() ``` In the above example, the returned result is empty after table loading. The metadata cache could be out of dated after loading new data into the table, because loading/inserting does not update the cache. So far, the metadata cache is only used for data source tables. Thus, for Hive serde tables, only `parquet` and `orc` formats are facing such issues, because the Hive serde tables in the format of parquet/orc could be converted to data source tables when `spark.sql.hive.convertMetastoreParquet`/`spark.sql.hive.convertMetastoreOrc` is on. This PR is to refresh the metadata cache after processing the `LOAD DATA` command. In addition, Spark SQL does not convert **partitioned** Hive tables (orc/parquet) to data source tables in the write path, but the read path is using the metadata cache for both **partitioned** and non-partitioned Hive tables (orc/parquet). That means, writing the partitioned parquet/orc tables still use `InsertIntoHiveTable`, instead of `InsertIntoHadoopFsRelationCommand`. To avoid reading the out-of-dated cache, `InsertIntoHiveTable` needs to refresh the metadata cache for partitioned tables. Note, it does not need to refresh the cache for non-partitioned parquet/orc tables, because it does not call `InsertIntoHiveTable` at all. Based on the comments, this PR will keep the existing logics unchanged. That means, we always refresh the table no matter whether the table is partitioned or not. ### How was this patch tested? Added test cases in parquetSuites.scala Author: gatorsmile <gatorsmile@gmail.com> Closes #16500 from gatorsmile/refreshInsertIntoHiveTable.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 4 additions, 0 deletions...scala/org/apache/spark/sql/execution/command/tables.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 5 additions, 5 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala 66 additions, 9 deletions.../test/scala/org/apache/spark/sql/hive/parquetSuites.scala
Loading
Please register or sign in to comment