core · 77d046ec47a9bfa6323aa014869844c28e18e049 · cs525-sp18-g07 / spark

[SPARK-21782][CORE] Repartition creates skews when numPartitions is a power of 2

Sergey Serebryakov authored 7 years ago

## Problem
When an RDD (particularly with a low item-per-partition ratio) is repartitioned to numPartitions = power of 2, the resulting partitions are very uneven-sized, due to using fixed seed to initialize PRNG, and using the PRNG only once. See details in https://issues.apache.org/jira/browse/SPARK-21782

## What changes were proposed in this pull request?
Instead of directly using `0, 1, 2,...` seeds to initialize `Random`, hash them with `scala.util.hashing.byteswap32()`.

## How was this patch tested?
`build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite test`

Author: Sergey Serebryakov <sserebryakov@tesla.com>

Closes #18990 from megaserg/repartition-skew.

77d046ec

History

77d046ec 7 years ago

History

Name	Last commit	Last update
..
src
pom.xml