Skip to content
Snippets Groups Projects
Commit 2f7d0b68 authored by Wenchen Fan's avatar Wenchen Fan Committed by Davies Liu
Browse files

[SPARK-12856] [SQL] speed up hashCode of unsafe array

We iterate the bytes to calculate hashCode before, but now we have `Murmur3_x86_32.hashUnsafeBytes` that don't require the bytes to be word algned, we should use that instead.

A simple benchmark shows it's about 3 X faster, benchmark code: https://gist.github.com/cloud-fan/fa77713ccebf0823b2ab#file-arrayhashbenchmark-scala

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10784 from cloud-fan/array-hashcode.
parent 242efb75
No related branches found
No related tags found
No related merge requests found
......@@ -25,6 +25,7 @@ import org.apache.spark.sql.catalyst.util.ArrayData;
import org.apache.spark.sql.types.*;
import org.apache.spark.unsafe.Platform;
import org.apache.spark.unsafe.array.ByteArrayMethods;
import org.apache.spark.unsafe.hash.Murmur3_x86_32;
import org.apache.spark.unsafe.types.CalendarInterval;
import org.apache.spark.unsafe.types.UTF8String;
......@@ -299,11 +300,7 @@ public class UnsafeArrayData extends ArrayData {
@Override
public int hashCode() {
int result = 37;
for (int i = 0; i < sizeInBytes; i++) {
result = 37 * result + Platform.getByte(baseObject, baseOffset + i);
}
return result;
return Murmur3_x86_32.hashUnsafeBytes(baseObject, baseOffset, sizeInBytes, 42);
}
@Override
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment