[SPARK-11945][ML][PYSPARK] Add computeCost to KMeansModel for PySpark spark.ml

Add ```computeCost``` to ```KMeansModel``` as evaluator for PySpark spark.ml. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9931 from yanboliang/SPARK-11945.

[SPARK-11945][ML][PYSPARK] Add computeCost to KMeansModel for PySpark spark.ml
95eb6516 · Yanbo Liang · Joseph K. Bradley · 007da1a9 · 95eb6516
Commit 95eb6516 authored 9 years ago by Yanbo Liang Committed by Joseph K. Bradley 9 years ago
--- a/python/pyspark/ml/clustering.py
+++ b/python/pyspark/ml/clustering.py
@@ -36,6 +36,14 @@ class KMeansModel(JavaModel):
        """Get the cluster centers, represented as a list of NumPy arrays."""
        return [c.toArray() for c in self._call_java("clusterCenters")]

+    @since("2.0.0")
+    def computeCost(self, dataset):
+        """
+        Return the K-means cost (sum of squared distances of points to their nearest center)
+        for this model on the given data.
+        """
+        return self._call_java("computeCost", dataset)
+

 @inherit_doc
 class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed):
@@ -53,6 +61,8 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol
    >>> centers = model.clusterCenters()
    >>> len(centers)
    2
+    >>> model.computeCost(df)
+    2.000...
    >>> transformed = model.transform(df).select("features", "prediction")
    >>> rows = transformed.collect()
    >>> rows[0].prediction == rows[1].prediction