-
- Downloads
[SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer...
[SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer when some quantiles are duplicated ## What changes were proposed in this pull request? In cases when QuantileDiscretizerSuite is called upon a numeric array with duplicated elements, we will take the unique elements generated from approxQuantiles as input for Bucketizer. ## How was this patch tested? An unit test is added in QuantileDiscretizerSuite QuantileDiscretizer.fit will throw an illegal exception when calling setSplits on a list of splits with duplicated elements. Bucketizer.setSplits should only accept either a numeric vector of two or more unique cut points, although that may produce less number of buckets than requested. Signed-off-by: VinceShieh <vincent.xieintel.com> Author: VinceShieh <vincent.xie@intel.com> Closes #14747 from VinceShieh/SPARK-17086.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala 6 additions, 1 deletion...ala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala 19 additions, 0 deletions...rg/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
Please register or sign in to comment