Commit 2ccf3b66 authored 12 years ago by Josh Rosen

Fix PySpark hash partitioning bug.

A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.

This commit adds a PySpark-specific workaround
and adds more tests.

parent 7859879a

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 54 additions and 9 deletions

Please register or to comment