-
- Downloads
[SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value
## What changes were proposed in this pull request? When I was investigating a flaky test, I realized that many places don't check the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When a batch is supposed to be there, the caller just ignores None rather than throwing an error. If some bug causes a query doesn't generate a batch metadata file, this behavior will hide it and allow the query continuing to run and finally delete metadata logs and make it hard to debug. This PR ensures that places calling HDFSMetadataLog.get always check the return value. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixiong@databricks.com> Closes #18799 from zsxwing/SPARK-21596.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala 14 additions, 2 deletions...rk/sql/execution/streaming/CompactibleFileStreamLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala 4 additions, 1 deletion...e/spark/sql/execution/streaming/FileStreamSourceLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala 54 additions, 3 deletions...pache/spark/sql/execution/streaming/HDFSMetadataLog.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala 12 additions, 5 deletions...pache/spark/sql/execution/streaming/StreamExecution.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala 17 additions, 0 deletions.../spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala 1 addition, 0 deletions...rg/apache/spark/sql/streaming/FileStreamSourceSuite.scala
Loading
Please register or sign in to comment