-
- Downloads
[SPARK-9918] [MLLIB] remove runs from k-means and rename epsilon to tol
This requires some discussion. I'm not sure whether `runs` is a useful parameter. It certainly complicates the implementation. We might want to optimize the k-means implementation with block matrix operations. In this case, having `runs` may not be worth the trade-off. Also it increases the communication cost in a single job, which might cause other issues. This PR also renames `epsilon` to `tol` to have consistent naming among algorithms. The Python constructor is updated to include all parameters. jkbradley yu-iskw Author: Xiangrui Meng <meng@databricks.com> Closes #8148 from mengxr/SPARK-9918 and squashes the following commits: 149b9e5 [Xiangrui Meng] fix constructor in Python and rename epsilon to tol 3cc15b3 [Xiangrui Meng] fix test and change initStep to initSteps in python a0a0274 [Xiangrui Meng] remove runs from k-means in the pipeline API
Showing
- mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 10 additions, 41 deletions...rc/main/scala/org/apache/spark/ml/clustering/KMeans.scala
- mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala 3 additions, 9 deletions...st/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
- python/pyspark/ml/clustering.py 13 additions, 50 deletionspython/pyspark/ml/clustering.py
Loading
Please register or sign in to comment