Skip to content
Snippets Groups Projects
  1. May 12, 2015
  2. May 10, 2015
    • Joseph K. Bradley's avatar
      [SPARK-7431] [ML] [PYTHON] Made CrossValidatorModel call parent init in PySpark · 3038443e
      Joseph K. Bradley authored
      Fixes bug with PySpark cvModel not having UID
      Also made small PySpark fixes: Evaluator should inherit from Params.  MockModel should inherit from Model.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #5968 from jkbradley/pyspark-cv-uid and squashes the following commits:
      
      57f13cd [Joseph K. Bradley] Made CrossValidatorModel call parent init in PySpark
      3038443e
  3. May 06, 2015
    • Xiangrui Meng's avatar
      [SPARK-6940] [MLLIB] Add CrossValidator to Python ML pipeline API · 32cdc815
      Xiangrui Meng authored
      Since CrossValidator is a meta algorithm, we copy the implementation in Python. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5926 from mengxr/SPARK-6940 and squashes the following commits:
      
      6af181f [Xiangrui Meng] add TODOs
      8285134 [Xiangrui Meng] update doc
      060f7c3 [Xiangrui Meng] update doctest
      acac727 [Xiangrui Meng] add keyword args
      cdddecd [Xiangrui Meng] add CrossValidator in Python
      32cdc815
  4. May 05, 2015
    • Xiangrui Meng's avatar
      [SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark · ee374e89
      Xiangrui Meng authored
      This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:
      
      25d7451 [Xiangrui Meng] fix tests in python 3
      babdde7 [Xiangrui Meng] fix doc
      cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
      ee374e89
  5. Apr 20, 2015
  6. Apr 16, 2015
    • Xiangrui Meng's avatar
      [SPARK-6893][ML] default pipeline parameter handling in python · 57cd1e86
      Xiangrui Meng authored
      Same as #5431 but for Python. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5534 from mengxr/SPARK-6893 and squashes the following commits:
      
      d3b519b [Xiangrui Meng] address comments
      ebaccc6 [Xiangrui Meng] style update
      fce244e [Xiangrui Meng] update explainParams with test
      4d6b07a [Xiangrui Meng] add tests
      5294500 [Xiangrui Meng] update default param handling in python
      57cd1e86
  7. Mar 09, 2015
  8. Feb 20, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release · 4a17eedb
      Joseph K. Bradley authored
      For SPARK-5867:
      * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
      * It should also include Python examples now.
      
      For SPARK-5892:
      * Fix Python docs
      * Various other cleanups
      
      BTW, I accidentally merged this with master.  If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
      
      CC: mengxr  (ML),  davies  (Python docs)
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:
      
      f191bb0 [Joseph K. Bradley] small cleanups
      e786efa [Joseph K. Bradley] small doc corrections
      6b1ab4a [Joseph K. Bradley] fixed python lint test
      946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  Changed spark.ml Java examples to use DataFrames API instead of sql()
      da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
      629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
      b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
      34b067f [Joseph K. Bradley] small doc correction
      da16aef [Joseph K. Bradley] Fixed python mllib docs
      8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
      695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
      a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
      b05a80d [Joseph K. Bradley] organize imports. doc cleanups
      e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
      4a17eedb
  9. Feb 15, 2015
    • Xiangrui Meng's avatar
      [SPARK-5769] Set params in constructors and in setParams in Python ML pipelines · cd4a1536
      Xiangrui Meng authored
      This PR allow Python users to set params in constructors and in setParams, where we use decorator `keyword_only` to force keyword arguments. The trade-off is discussed in the design doc of SPARK-4586.
      
      Generated doc:
      ![screen shot 2015-02-12 at 3 06 58 am](https://cloud.githubusercontent.com/assets/829644/6166491/9cfcd06a-b265-11e4-99ea-473d866634fc.png)
      
      CC: davies rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4564 from mengxr/py-pipeline-kw and squashes the following commits:
      
      fedf720 [Xiangrui Meng] use toDF
      d565f2c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into py-pipeline-kw
      cbc15d3 [Xiangrui Meng] fix style
      5032097 [Xiangrui Meng] update pipeline signature
      950774e [Xiangrui Meng] simplify keyword_only and update constructor/setParams signatures
      fdde5fc [Xiangrui Meng] fix style
      c9384b8 [Xiangrui Meng] fix sphinx doc
      8e59180 [Xiangrui Meng] add setParams and make constructors take params, where we force keyword args
      cd4a1536
  10. Jan 28, 2015
    • Xiangrui Meng's avatar
      [SPARK-4586][MLLIB] Python API for ML pipeline and parameters · e80dc1c5
      Xiangrui Meng authored
      This PR adds Python API for ML pipeline and parameters. The design doc can be found on the JIRA page. It includes transformers and an estimator to demo the simple text classification example code.
      
      TODO:
      - [x] handle parameters in LRModel
      - [x] unit tests
      - [x] missing some docs
      
      CC: davies jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4151 from mengxr/SPARK-4586 and squashes the following commits:
      
      415268e [Xiangrui Meng] remove inherit_doc from __init__
      edbd6fe [Xiangrui Meng] move Identifiable to ml.util
      44c2405 [Xiangrui Meng] Merge pull request #2 from davies/ml
      dd1256b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
      14ae7e2 [Davies Liu] fix docs
      54ca7df [Davies Liu] fix tests
      78638df [Davies Liu] Merge branch 'SPARK-4586' of github.com:mengxr/spark into ml
      fc59a02 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
      1dca16a [Davies Liu] refactor
      090b3a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into ml
      0882513 [Xiangrui Meng] update doc style
      a4f4dbf [Xiangrui Meng] add unit test for LR
      7521d1c [Xiangrui Meng] add unit tests to HashingTF and Tokenizer
      ba0ba1e [Xiangrui Meng] add unit tests for pipeline
      0586c7b [Xiangrui Meng] add more comments to the example
      5153cff [Xiangrui Meng] simplify java models
      036ca04 [Xiangrui Meng] gen numFeatures
      46fa147 [Xiangrui Meng] update mllib/pom.xml to include python files in the assembly
      1dcc17e [Xiangrui Meng] update code gen and make param appear in the doc
      f66ba0c [Xiangrui Meng] make params a property
      d5efd34 [Xiangrui Meng] update doc conf and move embedded param map to instance attribute
      f4d0fe6 [Xiangrui Meng] use LabeledDocument and Document in example
      05e3e40 [Xiangrui Meng] update example
      d3e8dbe [Xiangrui Meng] more docs optimize pipeline.fit impl
      56de571 [Xiangrui Meng] fix style
      d0c5bb8 [Xiangrui Meng] a working copy
      bce72f4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
      17ecfb9 [Xiangrui Meng] code gen for shared params
      d9ea77c [Xiangrui Meng] update doc
      c18dca1 [Xiangrui Meng] make the example working
      dadd84e [Xiangrui Meng] add base classes and docs
      a3015cf [Xiangrui Meng] add Estimator and Transformer
      46eea43 [Xiangrui Meng] a pipeline in python
      33b68e0 [Xiangrui Meng] a working LR
      e80dc1c5
Loading