-
- Downloads
[SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS
## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on the local filesystem. According to the [implementation notes](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html ) of `rename()`, the behavior of the local filesystem and HDFS varies: > Destination exists and is a file > Renaming a file atop an existing file is specified as failing, raising an exception. > - Local FileSystem : the rename succeeds; the destination file is replaced by the source file. > - HDFS : The rename fails, no exception is raised. Instead the method call simply returns false. This patch ensures that `rename()` isn't called if the destination file already exists. It's still semantically correct because Structured Streaming requires that rerunning a batch should generate the same output. ## How was this patch tested? This patch was tested by running `StateStoreSuite`. Author: Roberto Agostino Vitillo <ra.vitillo@gmail.com> Closes #17012 from vitillo/fix_rename. (cherry picked from commit 9734a928) Signed-off-by:Shixiong Zhu <shixiong@databricks.com>
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala 10 additions, 1 deletion...cution/streaming/state/HDFSBackedStateStoreProvider.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala 24 additions, 7 deletions...spark/sql/execution/streaming/state/StateStoreSuite.scala
Loading
Please register or sign in to comment