-
- Downloads
[SPARK-8428][SPARK-13850] Fix integer overflows in TimSort
## What changes were proposed in this pull request? This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets. ## How was this patch tested? Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur. Author: Sameer Agarwal <sameer@databricks.com> Closes #13336 from sameeragarwal/timsort-bug.
Showing
- core/src/main/java/org/apache/spark/shuffle/sort/ShuffleSortDataFormat.java 3 additions, 3 deletions.../org/apache/spark/shuffle/sort/ShuffleSortDataFormat.java
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSortDataFormat.java 3 additions, 3 deletions...ark/util/collection/unsafe/sort/UnsafeSortDataFormat.java
- core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala 24 additions, 0 deletions...rg/apache/spark/util/collection/ExternalSorterSuite.scala
Loading
Please register or sign in to comment