Skip to content
Snippets Groups Projects
Commit e1e77b22 authored by Yuhao Yang's avatar Yuhao Yang Committed by Joseph K. Bradley
Browse files

[SPARK-11029] [ML] Add computeCost to KMeansModel in spark.ml

jira: https://issues.apache.org/jira/browse/SPARK-11029

We should add a method analogous to spark.mllib.clustering.KMeansModel.computeCost to spark.ml.clustering.KMeansModel.
This will be a temp fix until we have proper evaluators defined for clustering.

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: yuhaoyang <yuhao@zhanglipings-iMac.local>

Closes #9073 from hhbyyh/computeCost.
parent 8ac71d62
No related branches found
No related tags found
No related merge requests found
...@@ -117,6 +117,18 @@ class KMeansModel private[ml] ( ...@@ -117,6 +117,18 @@ class KMeansModel private[ml] (
@Since("1.5.0") @Since("1.5.0")
def clusterCenters: Array[Vector] = parentModel.clusterCenters def clusterCenters: Array[Vector] = parentModel.clusterCenters
/**
* Return the K-means cost (sum of squared distances of points to their nearest center) for this
* model on the given data.
*/
// TODO: Replace the temp fix when we have proper evaluators defined for clustering.
@Since("1.6.0")
def computeCost(dataset: DataFrame): Double = {
SchemaUtils.checkColumnType(dataset.schema, $(featuresCol), new VectorUDT)
val data = dataset.select(col($(featuresCol))).map { case Row(point: Vector) => point }
parentModel.computeCost(data)
}
} }
/** /**
......
...@@ -104,5 +104,6 @@ class KMeansSuite extends SparkFunSuite with MLlibTestSparkContext { ...@@ -104,5 +104,6 @@ class KMeansSuite extends SparkFunSuite with MLlibTestSparkContext {
val clusters = transformed.select(predictionColName).map(_.getInt(0)).distinct().collect().toSet val clusters = transformed.select(predictionColName).map(_.getInt(0)).distinct().collect().toSet
assert(clusters.size === k) assert(clusters.size === k)
assert(clusters === Set(0, 1, 2, 3, 4)) assert(clusters === Set(0, 1, 2, 3, 4))
assert(model.computeCost(dataset) < 0.1)
} }
} }
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment