From 07fcbea516cda66498b9346467a34733f14e8605 Mon Sep 17 00:00:00 2001 From: Liang-Chi Hsieh <viirya@gmail.com> Date: Sat, 24 Dec 2016 12:05:49 +0000 Subject: [PATCH] [SPARK-18800][SQL] Correct the assert in UnsafeKVExternalSorter which ensures array size ## What changes were proposed in this pull request? `UnsafeKVExternalSorter` uses `UnsafeInMemorySorter` to sort the records of `BytesToBytesMap` if it is given a map. Currently we use the number of keys in `BytesToBytesMap` to determine if the array used for sort is enough or not. We has an assert that ensures the size of the array is enough: `map.numKeys() <= map.getArray().size() / 2`. However, each record in the map takes two entries in the array, one is record pointer, another is key prefix. So the correct assert should be `map.numKeys() * 2 <= map.getArray().size() / 2`. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #16232 from viirya/SPARK-18800-fix-UnsafeKVExternalSorter. --- .../apache/spark/sql/execution/UnsafeKVExternalSorter.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java index 0d51dc9ff8..ee5bcfd02c 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java @@ -97,7 +97,9 @@ public final class UnsafeKVExternalSorter { canUseRadixSort); } else { // The array will be used to do in-place sort, which require half of the space to be empty. - assert(map.numKeys() <= map.getArray().size() / 2); + // Note: each record in the map takes two entries in the array, one is record pointer, + // another is the key prefix. + assert(map.numKeys() * 2 <= map.getArray().size() / 2); // During spilling, the array in map will not be used, so we can borrow that and use it // as the underlying array for in-memory sorter (it's always large enough). // Since we will not grow the array, it's fine to pass `null` as consumer. -- GitLab