-
- Downloads
Fix PySpark hash partitioning bug.
A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests.
Showing
- core/src/main/scala/spark/api/python/PythonPartitioner.scala 41 additions, 0 deletionscore/src/main/scala/spark/api/python/PythonPartitioner.scala
- core/src/main/scala/spark/api/python/PythonRDD.scala 4 additions, 6 deletionscore/src/main/scala/spark/api/python/PythonRDD.scala
- pyspark/pyspark/rdd.py 9 additions, 3 deletionspyspark/pyspark/rdd.py
Loading
Please register or sign in to comment