From 07fcbea516cda66498b9346467a34733f14e8605 Mon Sep 17 00:00:00 2001
From: Liang-Chi Hsieh <viirya@gmail.com>
Date: Sat, 24 Dec 2016 12:05:49 +0000
Subject: [PATCH] [SPARK-18800][SQL] Correct the assert in
 UnsafeKVExternalSorter which ensures array size

## What changes were proposed in this pull request?

`UnsafeKVExternalSorter` uses `UnsafeInMemorySorter` to sort the records of `BytesToBytesMap` if it is given a map.

Currently we use the number of keys in `BytesToBytesMap` to determine if the array used for sort is enough or not. We has an assert that ensures the size of the array is enough: `map.numKeys() <= map.getArray().size() / 2`.

However, each record in the map takes two entries in the array, one is record pointer, another is key prefix. So the correct assert should be `map.numKeys() * 2 <= map.getArray().size() / 2`.

## How was this patch tested?

N/A

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #16232 from viirya/SPARK-18800-fix-UnsafeKVExternalSorter.
---
 .../apache/spark/sql/execution/UnsafeKVExternalSorter.java    | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
index 0d51dc9ff8..ee5bcfd02c 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
@@ -97,7 +97,9 @@ public final class UnsafeKVExternalSorter {
         canUseRadixSort);
     } else {
       // The array will be used to do in-place sort, which require half of the space to be empty.
-      assert(map.numKeys() <= map.getArray().size() / 2);
+      // Note: each record in the map takes two entries in the array, one is record pointer,
+      // another is the key prefix.
+      assert(map.numKeys() * 2 <= map.getArray().size() / 2);
       // During spilling, the array in map will not be used, so we can borrow that and use it
       // as the underlying array for in-memory sorter (it's always large enough).
       // Since we will not grow the array, it's fine to pass `null` as consumer.
-- 
GitLab