Skip to content
Snippets Groups Projects
  • Yu ISHIKAWA's avatar
    34a889db
    [SPARK-7879] [MLLIB] KMeans API for spark.ml Pipelines · 34a889db
    Yu ISHIKAWA authored
    I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
    
    [SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
    
    be752de [Yu ISHIKAWA] Add assertions
    a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
    4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
    fb2417c [Yu ISHIKAWA] Use getInt, instead of get
    f397be4 [Yu ISHIKAWA] Switch the comparisons.
    ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
    effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
    c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
    19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
    1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
    f8338bc [Yu ISHIKAWA] Add the placeholders in Python
    4a03003 [Yu ISHIKAWA] Test for contains in Python
    6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
    288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
    5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
    97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
    e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
    978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
    2ec80bc [Yu ISHIKAWA] Fit on 1 line
    e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
    b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
    f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
    3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
    4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
    2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
    19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
    4d2ad1e [Yu ISHIKAWA] Modify the indentations
    0ae422f [Yu ISHIKAWA] Add a test for `setParams`
    4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
    11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
    220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
    92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
    c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
    6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
    687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
    a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
    5bedc51 [Yu ISHIKAWA] Remve an extra new line
    444c289 [Yu ISHIKAWA] Add the validation for `runs`
    e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
    7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
    7991e15 [Yu ISHIKAWA] Add a validation for `k`
    c2df35d [Yu ISHIKAWA] Make `predict` private
    93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
    d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
    e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
    8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
    6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
    99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
    79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
    6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    11c2a12 [Yu ISHIKAWA] Limit the imports
    badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
    f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
    85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
    aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
    c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
    598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
    63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala
    34a889db
    History
    [SPARK-7879] [MLLIB] KMeans API for spark.ml Pipelines
    Yu ISHIKAWA authored
    I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
    
    [SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
    
    Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
    
    Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
    
    be752de [Yu ISHIKAWA] Add assertions
    a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
    4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
    fb2417c [Yu ISHIKAWA] Use getInt, instead of get
    f397be4 [Yu ISHIKAWA] Switch the comparisons.
    ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
    effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
    c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
    19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
    1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
    f8338bc [Yu ISHIKAWA] Add the placeholders in Python
    4a03003 [Yu ISHIKAWA] Test for contains in Python
    6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
    288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
    5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
    97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
    e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
    978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
    2ec80bc [Yu ISHIKAWA] Fit on 1 line
    e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
    b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
    f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
    3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
    4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
    2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
    19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
    4d2ad1e [Yu ISHIKAWA] Modify the indentations
    0ae422f [Yu ISHIKAWA] Add a test for `setParams`
    4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
    11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
    220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
    92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
    c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
    6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
    687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
    a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
    5bedc51 [Yu ISHIKAWA] Remve an extra new line
    444c289 [Yu ISHIKAWA] Add the validation for `runs`
    e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
    7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
    7991e15 [Yu ISHIKAWA] Add a validation for `k`
    c2df35d [Yu ISHIKAWA] Make `predict` private
    93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
    d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
    e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
    8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
    6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
    99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
    79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
    6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
    11c2a12 [Yu ISHIKAWA] Limit the imports
    badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
    f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
    85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
    aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
    c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
    598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
    63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala
pyspark.ml.rst 1.36 KiB

pyspark.ml package

ML Pipeline APIs

pyspark.ml.param module

pyspark.ml.feature module

pyspark.ml.classification module

pyspark.ml.clustering module

pyspark.ml.recommendation module

pyspark.ml.regression module

pyspark.ml.tuning module

pyspark.ml.evaluation module