-
- Downloads
[SPARK-15822] [SQL] Prevent byte array backed classes from referencing freed memory
## What changes were proposed in this pull request? `UTF8String` and all `Unsafe*` classes are backed by either on-heap or off-heap byte arrays. The code generated version `SortMergeJoin` buffers the left hand side join keys during iteration. This was actually problematic in off-heap mode when one of the keys is a `UTF8String` (or any other 'Unsafe*` object) and the left hand side iterator was exhausted (and released its memory); the buffered keys would reference freed memory. This causes Seg-faults and all kinds of other undefined behavior when we would use one these buffered keys. This PR fixes this problem by creating copies of the buffered variables. I have added a general method to the `CodeGenerator` for this. I have checked all places in which this could happen, and only `SortMergeJoin` had this problem. This PR is largely based on the work of robbinspg and he should be credited for this. closes https://github.com/apache/spark/pull/13707 ## How was this patch tested? Manually tested on problematic workloads. Author: Pete Robbins <robbinspg@gmail.com> Author: Herman van Hovell <hvanhovell@databricks.com> Closes #13723 from hvanhovell/SPARK-15822-2.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala 16 additions, 0 deletions...park/sql/catalyst/expressions/codegen/CodeGenerator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala 1 addition, 7 deletions.../apache/spark/sql/execution/joins/SortMergeJoinExec.scala
Loading
Please register or sign in to comment