-
- Downloads
Several fixes to sampling issues pointed out by Henry Milner:
- takeSample was biased towards earlier partitions - There were some range errors in takeSample - SampledRDDs with replacement didn't produce appropriate counts across partitions (we took exactly frac of each one)
Showing
- core/src/main/scala/spark/RDD.scala 6 additions, 7 deletionscore/src/main/scala/spark/RDD.scala
- core/src/main/scala/spark/SampledRDD.scala 14 additions, 10 deletionscore/src/main/scala/spark/SampledRDD.scala
- core/src/main/scala/spark/Utils.scala 15 additions, 11 deletionscore/src/main/scala/spark/Utils.scala
Please register or sign in to comment