Commits · 603a721c21488e17c15c45ce1de893e6b3d02274 · cs525-sp18-g07 / spark

Nov 18, 2015

[SPARK-11820][ML][PYSPARK] PySpark LiR & LoR should support weightCol · 603a721c

Yanbo Liang authored 9 years ago

[SPARK-7685](https://issues.apache.org/jira/browse/SPARK-7685) and [SPARK-9642](https://issues.apache.org/jira/browse/SPARK-9642) have already supported setting weight column for ```LogisticRegression``` and ```LinearRegression```. It's a very important feature, PySpark should also support. mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9811 from yanboliang/spark-11820.

603a721c

Nov 05, 2015

[SPARK-11473][ML] R-like summary statistics with intercept for OLS via normal equation solver · 9da7ceed

Yanbo Liang authored 9 years ago

Follow up [SPARK-9836](https://issues.apache.org/jira/browse/SPARK-9836), we should also support summary statistics for ```intercept```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9485 from yanboliang/spark-11473.

9da7ceed

[SPARK-11527][ML][PYSPARK] PySpark AFTSurvivalRegressionModel should expose... · 2e86cf1b

Yanbo Liang authored 9 years ago

[SPARK-11527][ML][PYSPARK] PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

PySpark ```AFTSurvivalRegressionModel``` should expose coefficients/intercept/scale. mengxr vectorijk

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9492 from yanboliang/spark-11527.

2e86cf1b

Nov 02, 2015

[SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models · c020f7d9

vectorijk authored 9 years ago

Deprecated in `LogisticRegression` and `LinearRegression`

Author: vectorijk <jiangkai@gmail.com>

Closes #9311 from vectorijk/spark-10592.

c020f7d9

Oct 28, 2015

[SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solver · f92b7b98

Yanbo Liang authored 9 years ago

[SPARK-10668](https://issues.apache.org/jira/browse/SPARK-10668) has provided ```WeightedLeastSquares``` solver("normal") in ```LinearRegression``` with L2 regularization in Scala and R, Python ML ```LinearRegression``` should also support setting solver("auto", "normal", "l-bfgs")

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9328 from yanboliang/spark-11367.

f92b7b98

Oct 27, 2015

[SPARK-10024][PYSPARK] Python API RF and GBT related params clear up · 9dba5fb2

vectorijk authored 9 years ago

implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API
in pyspark/ml/{classification, regression}.py

Author: vectorijk <jiangkai@gmail.com>

Closes #9233 from vectorijk/spark-10024.

9dba5fb2

Oct 07, 2015

[SPARK-9774] [ML] [PYSPARK] Add python api for ml regression isotonicregression · 3aff0866

Holden Karau authored 9 years ago

Add the Python API for isotonicregression.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8214 from holdenk/SPARK-9774-add-python-api-for-ml-regression-isotonicregression.

3aff0866

Oct 06, 2015

[SPARK-10957] [ML] setParams changes quantileProbabilities unexpectly in... · 5e035403

Xiangrui Meng authored 9 years ago

[SPARK-10957] [ML] setParams changes quantileProbabilities unexpectly in PySpark's AFTSurvivalRegression

If user doesn't specify `quantileProbs` in `setParams`, it will get reset to the default value. We don't need special handling here. vectorijk yanboliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #9001 from mengxr/SPARK-10957.

5e035403

[SPARK-10688] [ML] [PYSPARK] Python API for AFTSurvivalRegression · 5952bdb7

vectorijk authored 9 years ago

Implement Python API for AFTSurvivalRegression

Author: vectorijk <jiangkai@gmail.com>

Closes #8926 from vectorijk/spark-10688.

5952bdb7

Sep 17, 2015
- [SPARK-10283] [ML] [PYSPARK] [DOCS] Add @since annotation to pyspark.ml.regression · 29bf8aa5
  Yu ISHIKAWA authored 9 years ago
  
  Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8693 from yu-iskw/SPARK-10283.
  29bf8aa5
Sep 11, 2015

[SPARK-10026] [ML] [PySpark] Implement some common Params for regression in PySpark · b656e613

Yanbo Liang authored 9 years ago

LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here:
```scala
HasElasticNetParam
HasFitIntercept
HasStandardization
HasThresholds
```
Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8508 from yanboliang/spark-10026.

b656e613

Jul 07, 2015

[SPARK-8711] [ML] Add additional methods to PySpark ML tree models · 1dbc4a15

MechCoder authored 9 years ago

Add numNodes and depth to treeModels, add treeWeights to ensemble Models.
Add __repr__ to all models.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #7095 from MechCoder/missing_methods_tree and squashes the following commits:

23b08be [MechCoder] private [spark]
38a0860 [MechCoder] rename pyTreeWeights to javaTreeWeights
6d16ad8 [MechCoder] Fix Python 3 Error
47d7023 [MechCoder] Use np.allclose and treeEnsembleModel -> TreeEnsembleMethods
819098c [MechCoder] [SPARK-8711] [ML] Add additional methods ot PySpark ML tree models

1dbc4a15

May 20, 2015

[SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42... · 191ee474

Holden Karau authored 10 years ago

[SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random

Author: Holden Karau <holden@pigscanfly.ca>

Closes #6139 from holdenk/SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random and squashes the following commits:

591f8e5 [Holden Karau] specify old seed for doc tests
2470004 [Holden Karau] Fix a bunch of seeds with default values to have None as the default which will then result in using the hash of the class name
cbad96d [Holden Karau] Add the setParams function that is used in the real code
423b8d7 [Holden Karau] Switch the test code to behave slightly more like production code. also don't check the param map value only check for key existence
140d25d [Holden Karau] remove extra space
926165a [Holden Karau] Add some missing newlines for pep8 style
8616751 [Holden Karau] merge in master
58532e6 [Holden Karau] its the __name__ method, also treat None values as not set
56ef24a [Holden Karau] fix test and regenerate base
afdaa5c [Holden Karau] make sure different classes have different results
68eb528 [Holden Karau] switch default seed to hash of type of self
89c4611 [Holden Karau] Merge branch 'master' into SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random
31cd96f [Holden Karau] specify the seed to randomforestregressor test
e1b947f [Holden Karau] Style fixes
ce90ec8 [Holden Karau] merge in master
bcdf3c9 [Holden Karau] update docstring seeds to none and some other default seeds from 42
65eba21 [Holden Karau] pep8 fixes
0e3797e [Holden Karau] Make seed default to random in more places
213a543 [Holden Karau] Simplify the generated code to only include set default if there is a default rather than having None is note None in the generated code
1ff17c2 [Holden Karau] Make the seed random for HasSeed in python

191ee474

May 18, 2015

[SPARK-7380] [MLLIB] pipeline stages should be copyable in Python · 9c7e802a

Xiangrui Meng authored 10 years ago

This PR makes pipeline stages in Python copyable and hence simplifies some implementations. It also includes the following changes:

1. Rename `paramMap` and `defaultParamMap` to `_paramMap` and `_defaultParamMap`, respectively.
2. Accept a list of param maps in `fit`.
3. Use parent uid and name to identify param.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>
Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6088 from mengxr/SPARK-7380 and squashes the following commits:

413c463 [Xiangrui Meng] remove unnecessary doc
4159f35 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
611c719 [Xiangrui Meng] fix python style
68862b8 [Xiangrui Meng] update _java_obj initialization
927ad19 [Xiangrui Meng] fix ml/tests.py
0138fc3 [Xiangrui Meng] update feature transformers and fix a bug in RegexTokenizer
9ca44fb [Xiangrui Meng] simplify Java wrappers and add tests
c7d84ef [Xiangrui Meng] update ml/tests.py to test copy params
7e0d27f [Xiangrui Meng] merge master
46840fb [Xiangrui Meng] update wrappers
b6db1ed [Xiangrui Meng] update all self.paramMap to self._paramMap
46cb6ed [Xiangrui Meng] merge master
a163413 [Xiangrui Meng] fix style
1042e80 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
9630eae [Xiangrui Meng] fix Identifiable._randomUID
13bd70a [Xiangrui Meng] update ml/tests.py
64a536c [Xiangrui Meng] use _fit/_transform/_evaluate to simplify the impl
02abf13 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into copyable-python
66ce18c [Joseph K. Bradley] some cleanups before sending to Xiangrui
7431272 [Joseph K. Bradley] Rebased with master

9c7e802a

May 14, 2015

[SPARK-7619] [PYTHON] fix docstring signature · 48fc38f5

Xiangrui Meng authored 10 years ago

Just realized that we need `\` at the end of the docstring. brkyvz

Author: Xiangrui Meng <meng@databricks.com>

Closes #6161 from mengxr/SPARK-7619 and squashes the following commits:

e44495f [Xiangrui Meng] fix docstring signature

48fc38f5

[SPARK-7648] [MLLIB] Add weights and intercept to GLM wrappers in spark.ml · 723853ed

Xiangrui Meng authored 10 years ago

Otherwise, users can only use `transform` on the models. brkyvz

Author: Xiangrui Meng <meng@databricks.com>

Closes #6156 from mengxr/SPARK-7647 and squashes the following commits:

1ae3d2d [Xiangrui Meng] add weights and intercept to LogisticRegression in Python
f49eb46 [Xiangrui Meng] add weights and intercept to LinearRegressionModel

723853ed

May 12, 2015

[SPARK-7487] [ML] Feature Parity in PySpark for ml.regression · 8e935b0a

Burak Yavuz authored 10 years ago

Added LinearRegression Python API

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #6016 from brkyvz/ml-reg and squashes the following commits:

11c9ef9 [Burak Yavuz] address comments
1027a40 [Burak Yavuz] fix typo
4c699ad [Burak Yavuz] added tree regressor api
8afead2 [Burak Yavuz] made mixin for DT
fa51c74 [Burak Yavuz] save additions
0640d48 [Burak Yavuz] added ml.regression
82aac48 [Burak Yavuz] added linear regression

8e935b0a