-
- Downloads
[SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate...
[SPARK-19748][SQL] refresh function has a wrong order to do cache invalidate and regenerate the inmemory var for InMemoryFileIndex with FileStatusCache ## What changes were proposed in this pull request? If we refresh a InMemoryFileIndex with a FileStatusCache, it will first use the FileStatusCache to re-generate the cachedLeafFiles etc, then call FileStatusCache.invalidateAll. While the order to do these two actions is wrong, this lead to the refresh action does not take effect. ``` override def refresh(): Unit = { refresh0() fileStatusCache.invalidateAll() } private def refresh0(): Unit = { val files = listLeafFiles(rootPaths) cachedLeafFiles = new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => f.getPath -> f) cachedLeafDirToChildrenFiles = files.toArray.groupBy(_.getPath.getParent) cachedPartitionSpec = null } ``` ## How was this patch tested? unit test added Author: windpiger <songjun@outlook.com> Closes #17079 from windpiger/fixInMemoryFileIndexRefresh. (cherry picked from commit a350bc16) Signed-off-by:Wenchen Fan <wenchen@databricks.com>
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala 1 addition, 1 deletion...e/spark/sql/execution/datasources/InMemoryFileIndex.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala 26 additions, 0 deletions...ache/spark/sql/execution/datasources/FileIndexSuite.scala
Please register or sign in to comment