-
- Downloads
[SPARK-17529][CORE] Implement BitSet.clearUntil and use it during merge joins
## What changes were proposed in this pull request? Add a clearUntil() method on BitSet (adapted from the pre-existing setUntil() method). Use this method to clear the subset of the BitSet which needs to be used during merge joins. ## How was this patch tested? dev/run-tests, as well as performance tests on skewed data as described in jira. I expect there to be a small local performance hit using BitSet.clearUntil rather than BitSet.clear for normally shaped (unskewed) joins (additional read on the last long). This is expected to be de-minimis and was not specifically tested. Author: David Navas <davidn@clearstorydata.com> Closes #15084 from davidnavas/bitSet.
Showing
- core/src/main/scala/org/apache/spark/util/collection/BitSet.scala 18 additions, 10 deletions.../main/scala/org/apache/spark/util/collection/BitSet.scala
- core/src/test/scala/org/apache/spark/util/collection/BitSetSuite.scala 32 additions, 0 deletions.../scala/org/apache/spark/util/collection/BitSetSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala 2 additions, 2 deletions.../apache/spark/sql/execution/joins/SortMergeJoinExec.scala
Please register or sign in to comment