-
- Downloads
[SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine...
[SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine silhouette and squared Euclidean silhouette. ## What changes were proposed in this pull request? This PR adds the ClusteringEvaluator Evaluator which contains two metrics: - **cosineSilhouette**: the Silhouette measure using the cosine distance; - **squaredSilhouette**: the Silhouette measure using the squared Euclidean distance. The implementation of the two metrics refers to the algorithm proposed and explained [here](https://drive.google.com/file/d/0B0Hyo%5f%5fbG%5f3fdkNvSVNYX2E3ZU0/view). These algorithms have been thought for a distributed and parallel environment, thus they have reasonable performance, unlike a naive Silhouette implementation following its definition. ## How was this patch tested? The patch has been tested with the additional unit tests added (comparing the results with the ones provided by [Python sklearn library](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)). Author: Marco Gaido <mgaido@hortonworks.com> Closes #18538 from mgaido91/SPARK-14516.
Showing
- mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala 436 additions, 0 deletions.../org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
- mllib/src/test/resources/test-data/iris.libsvm 150 additions, 0 deletionsmllib/src/test/resources/test-data/iris.libsvm
- mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala 89 additions, 0 deletions...apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
Please register or sign in to comment