Skip to content
Snippets Groups Projects
Commit 27e88faa authored by Abou Haydar Elias's avatar Abou Haydar Elias Committed by Sean Owen
Browse files

[SPARK-13646][MLLIB] QuantileDiscretizer counts dataset twice in get…

## What changes were proposed in this pull request?

It avoids counting the dataframe twice.

Author: Abou Haydar Elias <abouhaydar.elias@gmail.com>
Author: Elie A <abouhaydar.elias@gmail.com>

Closes #11491 from eliasah/quantile-discretizer-patch.
parent dd83c209
No related branches found
No related tags found
No related merge requests found
......@@ -118,7 +118,7 @@ object QuantileDiscretizer extends DefaultParamsReadable[QuantileDiscretizer] wi
require(totalSamples > 0,
"QuantileDiscretizer requires non-empty input dataset but was given an empty input.")
val requiredSamples = math.max(numBins * numBins, minSamplesRequired)
val fraction = math.min(requiredSamples.toDouble / dataset.count(), 1.0)
val fraction = math.min(requiredSamples.toDouble / totalSamples, 1.0)
dataset.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()).collect()
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment