Skip to content
  • sboeschhuawei's avatar
    f377431a
    [SPARK-4259][MLlib]: Add Power Iteration Clustering Algorithm with Gaussian Similarity Function · f377431a
    sboeschhuawei authored
    Add single pseudo-eigenvector PIC
    Including documentations and updated pom.xml with the following codes:
    mllib/src/main/scala/org/apache/spark/mllib/clustering/PIClustering.scala
    mllib/src/test/scala/org/apache/spark/mllib/clustering/PIClusteringSuite.scala
    
    Author: sboeschhuawei <stephen.boesch@huawei.com>
    Author: Fan Jiang <fanjiang.sc@huawei.com>
    Author: Jiang Fan <fjiang6@gmail.com>
    Author: Stephen Boesch <stephen.boesch@huawei.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #4254 from fjiang6/PIC and squashes the following commits:
    
    4550850 [sboeschhuawei] Removed pic test data
    f292f31 [Stephen Boesch] Merge pull request #44 from mengxr/SPARK-4259
    4b78aaf [Xiangrui Meng] refactor PIC
    24fbf52 [sboeschhuawei] Updated API to be similar to KMeans plus other changes requested by Xiangrui on the PR
    c12dfc8 [sboeschhuawei] Removed examples files and added pic_data.txt. Revamped testcases yet to come
    92d4752 [sboeschhuawei] Move the Guassian/ Affinity matrix calcs out of PIC. Presently in the test suite
    7ebd149 [sboeschhuawei] Incorporate Xiangrui's first set of PR comments except restructure PIC.run to take Graph but do not remove Gaussian
    121e4d5 [sboeschhuawei] Remove unused testing data files
    1c3a62e [sboeschhuawei] removed matplot.py and reordered all private methods to bottom of PIC
    218a49d [sboeschhuawei] Applied Xiangrui's comments - especially removing RDD/PICLinalg classes and making noncritical methods private
    43ab10b [sboeschhuawei] Change last two println's to log4j logger
    88aacc8 [sboeschhuawei] Add assert to testcase on cluster sizes
    24f438e [sboeschhuawei] fixed incorrect markdown in clustering doc
    060e6bf [sboeschhuawei] Added link to PIC doc from the main clustering md doc
    be659e3 [sboeschhuawei] Added mllib specific log4j
    90e7fa4 [sboeschhuawei] Converted from custom Linalg routines to Breeze: added JavaDoc comments; added Markdown documentation
    bea48ea [sboeschhuawei] Converted custom Linear Algebra datatypes/routines to use Breeze.
    b29c0db [Fan Jiang] Update PIClustering.scala
    ace9749 [Fan Jiang] Update PIClustering.scala
    a112f38 [sboeschhuawei] Added graphx main and test jars as dependencies to mllib/pom.xml
    f656c34 [sboeschhuawei] Added iris dataset
    b7dbcbe [sboeschhuawei] Added axes and combined into single plot for matplotlib
    a2b1e57 [sboeschhuawei] Revert inadvertent update to KMeans
    9294263 [sboeschhuawei] Added visualization/plotting of input/output data
    e5df2b8 [sboeschhuawei] First end to end working PIC
    0700335 [sboeschhuawei] First end to end working version: but has bad performance issue
    32a90dc [sboeschhuawei] Update circles test data values
    0ef163f [sboeschhuawei] Added ConcentricCircles data generation and KMeans clustering
    3fd5bc8 [sboeschhuawei] PIClustering is running in new branch (up to the pseudo-eigenvector convergence step)
    d5aae20 [Jiang Fan] Adding Power Iteration Clustering and Suite test
    a3c5fbe [Jiang Fan] Adding Power Iteration Clustering
    f377431a
    [SPARK-4259][MLlib]: Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
    sboeschhuawei authored
    Add single pseudo-eigenvector PIC
    Including documentations and updated pom.xml with the following codes:
    mllib/src/main/scala/org/apache/spark/mllib/clustering/PIClustering.scala
    mllib/src/test/scala/org/apache/spark/mllib/clustering/PIClusteringSuite.scala
    
    Author: sboeschhuawei <stephen.boesch@huawei.com>
    Author: Fan Jiang <fanjiang.sc@huawei.com>
    Author: Jiang Fan <fjiang6@gmail.com>
    Author: Stephen Boesch <stephen.boesch@huawei.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #4254 from fjiang6/PIC and squashes the following commits:
    
    4550850 [sboeschhuawei] Removed pic test data
    f292f31 [Stephen Boesch] Merge pull request #44 from mengxr/SPARK-4259
    4b78aaf [Xiangrui Meng] refactor PIC
    24fbf52 [sboeschhuawei] Updated API to be similar to KMeans plus other changes requested by Xiangrui on the PR
    c12dfc8 [sboeschhuawei] Removed examples files and added pic_data.txt. Revamped testcases yet to come
    92d4752 [sboeschhuawei] Move the Guassian/ Affinity matrix calcs out of PIC. Presently in the test suite
    7ebd149 [sboeschhuawei] Incorporate Xiangrui's first set of PR comments except restructure PIC.run to take Graph but do not remove Gaussian
    121e4d5 [sboeschhuawei] Remove unused testing data files
    1c3a62e [sboeschhuawei] removed matplot.py and reordered all private methods to bottom of PIC
    218a49d [sboeschhuawei] Applied Xiangrui's comments - especially removing RDD/PICLinalg classes and making noncritical methods private
    43ab10b [sboeschhuawei] Change last two println's to log4j logger
    88aacc8 [sboeschhuawei] Add assert to testcase on cluster sizes
    24f438e [sboeschhuawei] fixed incorrect markdown in clustering doc
    060e6bf [sboeschhuawei] Added link to PIC doc from the main clustering md doc
    be659e3 [sboeschhuawei] Added mllib specific log4j
    90e7fa4 [sboeschhuawei] Converted from custom Linalg routines to Breeze: added JavaDoc comments; added Markdown documentation
    bea48ea [sboeschhuawei] Converted custom Linear Algebra datatypes/routines to use Breeze.
    b29c0db [Fan Jiang] Update PIClustering.scala
    ace9749 [Fan Jiang] Update PIClustering.scala
    a112f38 [sboeschhuawei] Added graphx main and test jars as dependencies to mllib/pom.xml
    f656c34 [sboeschhuawei] Added iris dataset
    b7dbcbe [sboeschhuawei] Added axes and combined into single plot for matplotlib
    a2b1e57 [sboeschhuawei] Revert inadvertent update to KMeans
    9294263 [sboeschhuawei] Added visualization/plotting of input/output data
    e5df2b8 [sboeschhuawei] First end to end working PIC
    0700335 [sboeschhuawei] First end to end working version: but has bad performance issue
    32a90dc [sboeschhuawei] Update circles test data values
    0ef163f [sboeschhuawei] Added ConcentricCircles data generation and KMeans clustering
    3fd5bc8 [sboeschhuawei] PIClustering is running in new branch (up to the pseudo-eigenvector convergence step)
    d5aae20 [Jiang Fan] Adding Power Iteration Clustering and Suite test
    a3c5fbe [Jiang Fan] Adding Power Iteration Clustering
Loading