Skip to content
Snippets Groups Projects
  • Xiangrui Meng's avatar
    e80dc1c5
    [SPARK-4586][MLLIB] Python API for ML pipeline and parameters · e80dc1c5
    Xiangrui Meng authored
    This PR adds Python API for ML pipeline and parameters. The design doc can be found on the JIRA page. It includes transformers and an estimator to demo the simple text classification example code.
    
    TODO:
    - [x] handle parameters in LRModel
    - [x] unit tests
    - [x] missing some docs
    
    CC: davies jkbradley
    
    Author: Xiangrui Meng <meng@databricks.com>
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4151 from mengxr/SPARK-4586 and squashes the following commits:
    
    415268e [Xiangrui Meng] remove inherit_doc from __init__
    edbd6fe [Xiangrui Meng] move Identifiable to ml.util
    44c2405 [Xiangrui Meng] Merge pull request #2 from davies/ml
    dd1256b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
    14ae7e2 [Davies Liu] fix docs
    54ca7df [Davies Liu] fix tests
    78638df [Davies Liu] Merge branch 'SPARK-4586' of github.com:mengxr/spark into ml
    fc59a02 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
    1dca16a [Davies Liu] refactor
    090b3a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into ml
    0882513 [Xiangrui Meng] update doc style
    a4f4dbf [Xiangrui Meng] add unit test for LR
    7521d1c [Xiangrui Meng] add unit tests to HashingTF and Tokenizer
    ba0ba1e [Xiangrui Meng] add unit tests for pipeline
    0586c7b [Xiangrui Meng] add more comments to the example
    5153cff [Xiangrui Meng] simplify java models
    036ca04 [Xiangrui Meng] gen numFeatures
    46fa147 [Xiangrui Meng] update mllib/pom.xml to include python files in the assembly
    1dcc17e [Xiangrui Meng] update code gen and make param appear in the doc
    f66ba0c [Xiangrui Meng] make params a property
    d5efd34 [Xiangrui Meng] update doc conf and move embedded param map to instance attribute
    f4d0fe6 [Xiangrui Meng] use LabeledDocument and Document in example
    05e3e40 [Xiangrui Meng] update example
    d3e8dbe [Xiangrui Meng] more docs optimize pipeline.fit impl
    56de571 [Xiangrui Meng] fix style
    d0c5bb8 [Xiangrui Meng] a working copy
    bce72f4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
    17ecfb9 [Xiangrui Meng] code gen for shared params
    d9ea77c [Xiangrui Meng] update doc
    c18dca1 [Xiangrui Meng] make the example working
    dadd84e [Xiangrui Meng] add base classes and docs
    a3015cf [Xiangrui Meng] add Estimator and Transformer
    46eea43 [Xiangrui Meng] a pipeline in python
    33b68e0 [Xiangrui Meng] a working LR
    e80dc1c5
    History
    [SPARK-4586][MLLIB] Python API for ML pipeline and parameters
    Xiangrui Meng authored
    This PR adds Python API for ML pipeline and parameters. The design doc can be found on the JIRA page. It includes transformers and an estimator to demo the simple text classification example code.
    
    TODO:
    - [x] handle parameters in LRModel
    - [x] unit tests
    - [x] missing some docs
    
    CC: davies jkbradley
    
    Author: Xiangrui Meng <meng@databricks.com>
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4151 from mengxr/SPARK-4586 and squashes the following commits:
    
    415268e [Xiangrui Meng] remove inherit_doc from __init__
    edbd6fe [Xiangrui Meng] move Identifiable to ml.util
    44c2405 [Xiangrui Meng] Merge pull request #2 from davies/ml
    dd1256b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
    14ae7e2 [Davies Liu] fix docs
    54ca7df [Davies Liu] fix tests
    78638df [Davies Liu] Merge branch 'SPARK-4586' of github.com:mengxr/spark into ml
    fc59a02 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
    1dca16a [Davies Liu] refactor
    090b3a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into ml
    0882513 [Xiangrui Meng] update doc style
    a4f4dbf [Xiangrui Meng] add unit test for LR
    7521d1c [Xiangrui Meng] add unit tests to HashingTF and Tokenizer
    ba0ba1e [Xiangrui Meng] add unit tests for pipeline
    0586c7b [Xiangrui Meng] add more comments to the example
    5153cff [Xiangrui Meng] simplify java models
    036ca04 [Xiangrui Meng] gen numFeatures
    46fa147 [Xiangrui Meng] update mllib/pom.xml to include python files in the assembly
    1dcc17e [Xiangrui Meng] update code gen and make param appear in the doc
    f66ba0c [Xiangrui Meng] make params a property
    d5efd34 [Xiangrui Meng] update doc conf and move embedded param map to instance attribute
    f4d0fe6 [Xiangrui Meng] use LabeledDocument and Document in example
    05e3e40 [Xiangrui Meng] update example
    d3e8dbe [Xiangrui Meng] more docs optimize pipeline.fit impl
    56de571 [Xiangrui Meng] fix style
    d0c5bb8 [Xiangrui Meng] a working copy
    bce72f4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
    17ecfb9 [Xiangrui Meng] code gen for shared params
    d9ea77c [Xiangrui Meng] update doc
    c18dca1 [Xiangrui Meng] make the example working
    dadd84e [Xiangrui Meng] add base classes and docs
    a3015cf [Xiangrui Meng] add Estimator and Transformer
    46eea43 [Xiangrui Meng] a pipeline in python
    33b68e0 [Xiangrui Meng] a working LR