Skip to content
Snippets Groups Projects
Unverified Commit 51754d6d authored by Sean Owen's avatar Sean Owen
Browse files

[SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils


## What changes were proposed in this pull request?

Fix reservoir sampling bias for small k. An off-by-one error meant that the probability of replacement was slightly too high -- k/(l-1) after l element instead of k/l, which matters for small k.

## How was this patch tested?

Existing test plus new test case.

Author: Sean Owen <sowen@cloudera.com>

Closes #16129 from srowen/SPARK-18678.

(cherry picked from commit 79f5f281)
Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
parent 99c293ee
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment