-
- Downloads
[SPARK-5893] [ML] Add bucketizer
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5893). One thing to make clear, the `buckets` parameter, which is an array of `Double`, performs as split points. Say, ```scala buckets = Array(-0.5, 0.0, 0.5) ``` splits the real number into 4 ranges, (-inf, -0.5], (-0.5, 0.0], (0.0, 0.5], (0.5, +inf), which is encoded as 0, 1, 2, 3. Author: Xusen Yin <yinxusen@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #5980 from yinxusen/SPARK-5893 and squashes the following commits: dc8c843 [Xusen Yin] Merge pull request #4 from jkbradley/yinxusen-SPARK-5893 1ca973a [Joseph K. Bradley] one more bucketizer test 34f124a [Joseph K. Bradley] Removed lowerInclusive, upperInclusive params from Bucketizer, and used splits instead. eacfcfa [Xusen Yin] change ML attribute from splits into buckets c3cc770 [Xusen Yin] add more unit test for binary search 3a16cc2 [Xusen Yin] refine comments and names ac77859 [Xusen Yin] fix style error fb30d79 [Xusen Yin] fix and test binary search 2466322 [Xusen Yin] refactor Bucketizer 11fb00a [Xusen Yin] change it into an Estimator 998bc87 [Xusen Yin] check buckets 4024cf1 [Xusen Yin] add test suite 5fe190e [Xusen Yin] add bucketizer
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 131 additions, 0 deletions...c/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
- mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala 11 additions, 0 deletions...src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala 148 additions, 0 deletions...t/scala/org/apache/spark/ml/feature/BucketizerSuite.scala
Please register or sign in to comment