Skip to content
Snippets Groups Projects
Commit 20d8ef85 authored by Joseph K. Bradley's avatar Joseph K. Bradley
Browse files

[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example

Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10707 from jkbradley/kmeans-doc-fix.
parent 021dafc6
No related branches found
No related tags found
No related merge requests found
......@@ -152,11 +152,7 @@ clusters = KMeans.train(parsedData, 2, maxIterations=10,
runs=10, initializationMode="random")
# Evaluate clustering by computing Within Set Sum of Squared Errors
def error(point):
center = clusters.centers[clusters.predict(point)]
return sqrt(sum([x**2 for x in (point - center)]))
WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
WSSSE = clusters.computeCost(parsedData)
print("Within Set Sum of Squared Error = " + str(WSSSE))
# Save and load model
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment