Skip to content
Snippets Groups Projects
Commit 95eb6516 authored by Yanbo Liang's avatar Yanbo Liang Committed by Joseph K. Bradley
Browse files

[SPARK-11945][ML][PYSPARK] Add computeCost to KMeansModel for PySpark spark.ml

Add ```computeCost``` to ```KMeansModel``` as evaluator for PySpark spark.ml.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9931 from yanboliang/SPARK-11945.
parent 007da1a9
No related branches found
No related tags found
No related merge requests found
......@@ -36,6 +36,14 @@ class KMeansModel(JavaModel):
"""Get the cluster centers, represented as a list of NumPy arrays."""
return [c.toArray() for c in self._call_java("clusterCenters")]
@since("2.0.0")
def computeCost(self, dataset):
"""
Return the K-means cost (sum of squared distances of points to their nearest center)
for this model on the given data.
"""
return self._call_java("computeCost", dataset)
@inherit_doc
class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed):
......@@ -53,6 +61,8 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol
>>> centers = model.clusterCenters()
>>> len(centers)
2
>>> model.computeCost(df)
2.000...
>>> transformed = model.transform(df).select("features", "prediction")
>>> rows = transformed.collect()
>>> rows[0].prediction == rows[1].prediction
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment