Skip to content
Snippets Groups Projects
  1. Nov 18, 2015
  2. Nov 05, 2015
  3. Nov 02, 2015
  4. Oct 28, 2015
  5. Oct 27, 2015
  6. Oct 07, 2015
  7. Oct 06, 2015
  8. Sep 17, 2015
  9. Sep 11, 2015
    • Yanbo Liang's avatar
      [SPARK-10026] [ML] [PySpark] Implement some common Params for regression in PySpark · b656e613
      Yanbo Liang authored
      LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here:
      ```scala
      HasElasticNetParam
      HasFitIntercept
      HasStandardization
      HasThresholds
      ```
      Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8508 from yanboliang/spark-10026.
      b656e613
  10. Jul 07, 2015
    • MechCoder's avatar
      [SPARK-8711] [ML] Add additional methods to PySpark ML tree models · 1dbc4a15
      MechCoder authored
      Add numNodes and depth to treeModels, add treeWeights to ensemble Models.
      Add __repr__ to all models.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7095 from MechCoder/missing_methods_tree and squashes the following commits:
      
      23b08be [MechCoder] private [spark]
      38a0860 [MechCoder] rename pyTreeWeights to javaTreeWeights
      6d16ad8 [MechCoder] Fix Python 3 Error
      47d7023 [MechCoder] Use np.allclose and treeEnsembleModel -> TreeEnsembleMethods
      819098c [MechCoder] [SPARK-8711] [ML] Add additional methods ot PySpark ML tree models
      1dbc4a15
  11. May 20, 2015
    • Holden Karau's avatar
      [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42... · 191ee474
      Holden Karau authored
      [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6139 from holdenk/SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random and squashes the following commits:
      
      591f8e5 [Holden Karau] specify old seed for doc tests
      2470004 [Holden Karau] Fix a bunch of seeds with default values to have None as the default which will then result in using the hash of the class name
      cbad96d [Holden Karau] Add the setParams function that is used in the real code
      423b8d7 [Holden Karau] Switch the test code to behave slightly more like production code. also don't check the param map value only check for key existence
      140d25d [Holden Karau] remove extra space
      926165a [Holden Karau] Add some missing newlines for pep8 style
      8616751 [Holden Karau] merge in master
      58532e6 [Holden Karau] its the __name__ method, also treat None values as not set
      56ef24a [Holden Karau] fix test and regenerate base
      afdaa5c [Holden Karau] make sure different classes have different results
      68eb528 [Holden Karau] switch default seed to hash of type of self
      89c4611 [Holden Karau] Merge branch 'master' into SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random
      31cd96f [Holden Karau] specify the seed to randomforestregressor test
      e1b947f [Holden Karau] Style fixes
      ce90ec8 [Holden Karau] merge in master
      bcdf3c9 [Holden Karau] update docstring seeds to none and some other default seeds from 42
      65eba21 [Holden Karau] pep8 fixes
      0e3797e [Holden Karau] Make seed default to random in more places
      213a543 [Holden Karau] Simplify the generated code to only include set default if there is a default rather than having None is note None in the generated code
      1ff17c2 [Holden Karau] Make the seed random for HasSeed in python
      191ee474
  12. May 18, 2015
    • Xiangrui Meng's avatar
      [SPARK-7380] [MLLIB] pipeline stages should be copyable in Python · 9c7e802a
      Xiangrui Meng authored
      This PR makes pipeline stages in Python copyable and hence simplifies some implementations. It also includes the following changes:
      
      1. Rename `paramMap` and `defaultParamMap` to `_paramMap` and `_defaultParamMap`, respectively.
      2. Accept a list of param maps in `fit`.
      3. Use parent uid and name to identify param.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6088 from mengxr/SPARK-7380 and squashes the following commits:
      
      413c463 [Xiangrui Meng] remove unnecessary doc
      4159f35 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
      611c719 [Xiangrui Meng] fix python style
      68862b8 [Xiangrui Meng] update _java_obj initialization
      927ad19 [Xiangrui Meng] fix ml/tests.py
      0138fc3 [Xiangrui Meng] update feature transformers and fix a bug in RegexTokenizer
      9ca44fb [Xiangrui Meng] simplify Java wrappers and add tests
      c7d84ef [Xiangrui Meng] update ml/tests.py to test copy params
      7e0d27f [Xiangrui Meng] merge master
      46840fb [Xiangrui Meng] update wrappers
      b6db1ed [Xiangrui Meng] update all self.paramMap to self._paramMap
      46cb6ed [Xiangrui Meng] merge master
      a163413 [Xiangrui Meng] fix style
      1042e80 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
      9630eae [Xiangrui Meng] fix Identifiable._randomUID
      13bd70a [Xiangrui Meng] update ml/tests.py
      64a536c [Xiangrui Meng] use _fit/_transform/_evaluate to simplify the impl
      02abf13 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into copyable-python
      66ce18c [Joseph K. Bradley] some cleanups before sending to Xiangrui
      7431272 [Joseph K. Bradley] Rebased with master
      9c7e802a
  13. May 14, 2015
    • Xiangrui Meng's avatar
      [SPARK-7619] [PYTHON] fix docstring signature · 48fc38f5
      Xiangrui Meng authored
      Just realized that we need `\` at the end of the docstring. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6161 from mengxr/SPARK-7619 and squashes the following commits:
      
      e44495f [Xiangrui Meng] fix docstring signature
      48fc38f5
    • Xiangrui Meng's avatar
      [SPARK-7648] [MLLIB] Add weights and intercept to GLM wrappers in spark.ml · 723853ed
      Xiangrui Meng authored
      Otherwise, users can only use `transform` on the models. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6156 from mengxr/SPARK-7647 and squashes the following commits:
      
      1ae3d2d [Xiangrui Meng] add weights and intercept to LogisticRegression in Python
      f49eb46 [Xiangrui Meng] add weights and intercept to LinearRegressionModel
      723853ed
  14. May 12, 2015
    • Burak Yavuz's avatar
      [SPARK-7487] [ML] Feature Parity in PySpark for ml.regression · 8e935b0a
      Burak Yavuz authored
      Added LinearRegression Python API
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #6016 from brkyvz/ml-reg and squashes the following commits:
      
      11c9ef9 [Burak Yavuz] address comments
      1027a40 [Burak Yavuz] fix typo
      4c699ad [Burak Yavuz] added tree regressor api
      8afead2 [Burak Yavuz] made mixin for DT
      fa51c74 [Burak Yavuz] save additions
      0640d48 [Burak Yavuz] added ml.regression
      82aac48 [Burak Yavuz] added linear regression
      8e935b0a
Loading