-
- Downloads
[SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode
## What changes were proposed in this pull request? This PR fixes executor OOM in offheap mode due to bug in Cooperative Memory Management for UnsafeExternSorter. UnsafeExternalSorter was checking if memory page is being used by upstream by comparing the base object address of the current page with the base object address of upstream. However, in case of offheap memory allocation, the base object addresses are always null, so there was no spilling happening and eventually the operator would OOM. Following is the stack trace this issue addresses - java.lang.OutOfMemoryError: Unable to acquire 1220 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:341) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:362) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:93) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:170) ## How was this patch tested? Tested by running the failing job. Author: Sital Kedia <skedia@fb.com> Closes #14693 from sitalkedia/fix_offheap_oom.
Showing
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java 1 addition, 1 deletion...ark/util/collection/unsafe/sort/UnsafeExternalSorter.java
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java 7 additions, 0 deletions...ark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
Please register or sign in to comment