-
- Downloads
[SPARK-2251] fix concurrency issues in random sampler
The following code is very likely to throw an exception: ~~~ val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1) rdd.zip(rdd).count() ~~~ because the same random number generator is used in compute partitions. Author: Xiangrui Meng <meng@databricks.com> Closes #1229 from mengxr/fix-sample and squashes the following commits: f1ee3d7 [Xiangrui Meng] fix concurrency issues in random sampler
Showing
- core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala 12 additions, 12 deletions...in/scala/org/apache/spark/util/random/RandomSampler.scala
- core/src/test/scala/org/apache/spark/rdd/PartitionwiseSampledRDDSuite.scala 14 additions, 4 deletions...a/org/apache/spark/rdd/PartitionwiseSampledRDDSuite.scala
- core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala 12 additions, 6 deletions...ala/org/apache/spark/util/random/RandomSamplerSuite.scala
Please register or sign in to comment