-
- Downloads
[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()
Using external sort to support sort large datasets in reduce stage. Author: Davies Liu <davies.liu@gmail.com> Closes #1978 from davies/sort and squashes the following commits: bbcd9ba [Davies Liu] check spilled bytes in tests b125d2f [Davies Liu] add test for external sort in rdd eae0176 [Davies Liu] choose different disks from different processes and instances 1f075ed [Davies Liu] Merge branch 'master' into sort eb53ca6 [Davies Liu] Merge branch 'master' into sort 644abaf [Davies Liu] add license in LICENSE 19f7873 [Davies Liu] improve tests 55602ee [Davies Liu] use external sort in sortBy() and sortByKey()
Showing
- .rat-excludes 1 addition, 0 deletions.rat-excludes
- LICENSE 283 additions, 0 deletionsLICENSE
- python/pyspark/heapq3.py 890 additions, 0 deletionspython/pyspark/heapq3.py
- python/pyspark/rdd.py 7 additions, 2 deletionspython/pyspark/rdd.py
- python/pyspark/shuffle.py 83 additions, 8 deletionspython/pyspark/shuffle.py
- python/pyspark/tests.py 41 additions, 1 deletionpython/pyspark/tests.py
- tox.ini 1 addition, 1 deletiontox.ini
Loading
Please register or sign in to comment