-
- Downloads
[SPARK-21315][SQL] Skip some spill files when generateIterator(startIndex) in...
[SPARK-21315][SQL] Skip some spill files when generateIterator(startIndex) in ExternalAppendOnlyUnsafeRowArray. ## What changes were proposed in this pull request? In current code, it is expensive to use `UnboundedFollowingWindowFunctionFrame`, because it is iterating from the start to lower bound every time calling `write` method. When traverse the iterator, it's possible to skip some spilled files thus to save some time. ## How was this patch tested? Added unit test Did a small test for benchmark: Put 2000200 rows into `UnsafeExternalSorter`-- 2 spill files(each contains 1000000 rows) and inMemSorter contains 200 rows. Move the iterator forward to index=2000001. *With this change*: `getIterator(2000001)`, it will cost almost 0ms~1ms; *Without this change*: `for(int i=0; i<2000001; i++)geIterator().loadNext()`, it will cost 300ms. Author: jinxing <jinxing6042@126.com> Closes #18541 from jinxing64/SPARK-21315.
Showing
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java 30 additions, 5 deletions...ark/util/collection/unsafe/sort/UnsafeExternalSorter.java
- core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java 4 additions, 0 deletions.../util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
- core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java 33 additions, 1 deletion...til/collection/unsafe/sort/UnsafeExternalSorterSuite.java
- sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala 2 additions, 20 deletions...park/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala 1 addition, 1 deletion...execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
Loading
Please register or sign in to comment