-
- Downloads
[SPARK-18928] Check TaskContext.isInterrupted() in FileScanRDD, JDBCRDD & UnsafeSorter
## What changes were proposed in this pull request? In order to respond to task cancellation, Spark tasks must periodically check `TaskContext.isInterrupted()`, but this check is missing on a few critical read paths used in Spark SQL, including `FileScanRDD`, `JDBCRDD`, and UnsafeSorter-based sorts. This can cause interrupted / cancelled tasks to continue running and become zombies (as also described in #16189). This patch aims to fix this problem by adding `TaskContext.isInterrupted()` checks to these paths. Note that I could have used `InterruptibleIterator` to simply wrap a bunch of iterators but in some cases this would have an adverse performance penalty or might not be effective due to certain special uses of Iterators in Spark SQL. Instead, I inlined `InterruptibleIterator`-style logic into existing iterator subclasses. ## How was this patch tested? Tested manually in `spark-shell` with two different reproductions of non-cancellable tasks, one involving scans of huge files and another involving sort-merge joins that spill to disk. Both causes of zombie tasks are fixed by the changes added here. Author: Josh Rosen <joshrosen@databricks.com> Closes #16340 from JoshRosen/sql-task-interruption. (cherry picked from commit 5857b9ac) Signed-off-by:Herman van Hovell <hvanhovell@databricks.com>
Showing
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java 11 additions, 0 deletions...ark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java 11 additions, 0 deletions.../util/collection/unsafe/sort/UnsafeSorterSpillReader.java
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala 10 additions, 2 deletions.../apache/spark/sql/execution/datasources/FileScanRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala 3 additions, 2 deletions...apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
Loading
Please register or sign in to comment