Skip to content
Snippets Groups Projects
Unverified Commit 8d33e1e5 authored by Sean Owen's avatar Sean Owen
Browse files

[SPARK-11560][MLLIB] Optimize KMeans implementation / remove 'runs'

## What changes were proposed in this pull request?

This is a revival of https://github.com/apache/spark/pull/14948 and related to https://github.com/apache/spark/pull/14937. This removes the 'runs' parameter, which has already been disabled, from the K-means implementation and further deprecates API methods that involve it.

This also happens to resolve the issue that K-means should not return duplicate centers, meaning that it may return less than k centroids if not enough data is available.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #15342 from srowen/SPARK-11560.
parent c264ef9b
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment