-
- Downloads
[SPARK-3250] Implement Gap Sampling optimization for random sampling
More efficient sampling, based on Gap Sampling optimization: http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/ Author: Erik Erlandson <eerlands@redhat.com> Closes #2455 from erikerlandson/spark-3250-pr and squashes the following commits: 72496bc [Erik Erlandson] [SPARK-3250] Implement Gap Sampling optimization for random sampling
Showing
- core/src/main/scala/org/apache/spark/rdd/RDD.scala 4 additions, 2 deletionscore/src/main/scala/org/apache/spark/rdd/RDD.scala
- core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala 264 additions, 22 deletions...in/scala/org/apache/spark/util/random/RandomSampler.scala
- core/src/test/java/org/apache/spark/JavaAPISuite.java 4 additions, 5 deletionscore/src/test/java/org/apache/spark/JavaAPISuite.java
- core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala 516 additions, 90 deletions...ala/org/apache/spark/util/random/RandomSamplerSuite.scala
- mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 2 additions, 2 deletions.../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
Loading
Please register or sign in to comment