Skip to content
Snippets Groups Projects
Commit 1bfd9347 authored by ihainan's avatar ihainan Committed by Sean Owen
Browse files

[SPARK-10184] [CORE] Optimization for bounds determination in RangePartitioner

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-10184

Change `cumWeight > target` to `cumWeight >= target` in `RangePartitioner.determineBounds` method to make the output partitions more balanced.

Author: ihainan <ihainan72@gmail.com>

Closes #8397 from ihainan/opt_for_rangepartitioner.
parent ca69fc8e
No related branches found
No related tags found
No related merge requests found
...@@ -291,7 +291,7 @@ private[spark] object RangePartitioner { ...@@ -291,7 +291,7 @@ private[spark] object RangePartitioner {
while ((i < numCandidates) && (j < partitions - 1)) { while ((i < numCandidates) && (j < partitions - 1)) {
val (key, weight) = ordered(i) val (key, weight) = ordered(i)
cumWeight += weight cumWeight += weight
if (cumWeight > target) { if (cumWeight >= target) {
// Skip duplicate values. // Skip duplicate values.
if (previousBound.isEmpty || ordering.gt(key, previousBound.get)) { if (previousBound.isEmpty || ordering.gt(key, previousBound.get)) {
bounds += key bounds += key
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment