Skip to content
Snippets Groups Projects
  • freeman's avatar
    98c556eb
    Streaming KMeans [MLLIB][SPARK-3254] · 98c556eb
    freeman authored
    This adds a Streaming KMeans algorithm to MLlib. It uses an update rule that generalizes the mini-batch KMeans update to incorporate a decay factor, which allows past data to be forgotten. The decay factor can be specified explicitly, or via a more intuitive "fractional decay" setting, in units of either data points or batches.
    
    The PR includes:
    - StreamingKMeans algorithm with decay factor settings
    - Usage example
    - Additions to documentation clustering page
    - Unit tests of basic behavior and decay behaviors
    
    tdas mengxr rezazadeh
    
    Author: freeman <the.freeman.lab@gmail.com>
    Author: Jeremy Freeman <the.freeman.lab@gmail.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #2942 from freeman-lab/streaming-kmeans and squashes the following commits:
    
    b2e5b4a [freeman] Fixes to docs / examples
    078617c [Jeremy Freeman] Merge pull request #1 from mengxr/SPARK-3254
    2e682c0 [Xiangrui Meng] take discount on previous weights; use BLAS; detect dying clusters
    0411bf5 [freeman] Change decay parameterization
    9f7aea9 [freeman] Style fixes
    374a706 [freeman] Formatting
    ad9bdc2 [freeman] Use labeled points and predictOnValues in examples
    77dbd3f [freeman] Make initialization check an assertion
    9cfc301 [freeman] Make random seed an argument
    44050a9 [freeman] Simpler constructor
    c7050d5 [freeman] Fix spacing
    2899623 [freeman] Use pattern matching for clarity
    a4a316b [freeman] Use collect
    1472ec5 [freeman] Doc formatting
    ea22ec8 [freeman] Fix imports
    2086bdc [freeman] Log cluster center updates
    ea9877c [freeman] More documentation
    9facbe3 [freeman] Bug fix
    5db7074 [freeman] Example usage for StreamingKMeans
    f33684b [freeman] Add explanation and example to docs
    b5b5f8d [freeman] Add better documentation
    a0fd790 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-kmeans
    9fd9c15 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-kmeans
    b93350f [freeman] Streaming KMeans with decay
    98c556eb
    History
    Streaming KMeans [MLLIB][SPARK-3254]
    freeman authored
    This adds a Streaming KMeans algorithm to MLlib. It uses an update rule that generalizes the mini-batch KMeans update to incorporate a decay factor, which allows past data to be forgotten. The decay factor can be specified explicitly, or via a more intuitive "fractional decay" setting, in units of either data points or batches.
    
    The PR includes:
    - StreamingKMeans algorithm with decay factor settings
    - Usage example
    - Additions to documentation clustering page
    - Unit tests of basic behavior and decay behaviors
    
    tdas mengxr rezazadeh
    
    Author: freeman <the.freeman.lab@gmail.com>
    Author: Jeremy Freeman <the.freeman.lab@gmail.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #2942 from freeman-lab/streaming-kmeans and squashes the following commits:
    
    b2e5b4a [freeman] Fixes to docs / examples
    078617c [Jeremy Freeman] Merge pull request #1 from mengxr/SPARK-3254
    2e682c0 [Xiangrui Meng] take discount on previous weights; use BLAS; detect dying clusters
    0411bf5 [freeman] Change decay parameterization
    9f7aea9 [freeman] Style fixes
    374a706 [freeman] Formatting
    ad9bdc2 [freeman] Use labeled points and predictOnValues in examples
    77dbd3f [freeman] Make initialization check an assertion
    9cfc301 [freeman] Make random seed an argument
    44050a9 [freeman] Simpler constructor
    c7050d5 [freeman] Fix spacing
    2899623 [freeman] Use pattern matching for clarity
    a4a316b [freeman] Use collect
    1472ec5 [freeman] Doc formatting
    ea22ec8 [freeman] Fix imports
    2086bdc [freeman] Log cluster center updates
    ea9877c [freeman] More documentation
    9facbe3 [freeman] Bug fix
    5db7074 [freeman] Example usage for StreamingKMeans
    f33684b [freeman] Add explanation and example to docs
    b5b5f8d [freeman] Add better documentation
    a0fd790 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-kmeans
    9fd9c15 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-kmeans
    b93350f [freeman] Streaming KMeans with decay