-
- Downloads
[SPARK-5956] [MLLIB] Pipeline components should be copyable.
This PR added `copy(extra: ParamMap): Params` to `Params`, which makes a copy of the current instance with a randomly generated uid and some extra param values. With this change, we only need to implement `fit` and `transform` without extra param values given the default implementation of `fit(dataset, extra)`: ~~~scala def fit(dataset: DataFrame, extra: ParamMap): Model = { copy(extra).fit(dataset) } ~~~ Inside `fit` and `transform`, since only the embedded values are used, I added `$` as an alias for `getOrDefault` to make the code easier to read. For example, in `LinearRegression.fit` we have: ~~~scala val effectiveRegParam = $(regParam) / yStd val effectiveL1RegParam = $(elasticNetParam) * effectiveRegParam val effectiveL2RegParam = (1.0 - $(elasticNetParam)) * effectiveRegParam ~~~ Meta-algorithm like `Pipeline` implements its own `copy(extra)`. So the fitted pipeline model stored all copied stages (no matter whether it is a transformer or a model). Other changes: * `Params$.inheritValues` is moved to `Params!.copyValues` and returns the target instance. * `fittingParamMap` was removed because the `parent` carries this information. * `validate` was renamed to `validateParams` to be more precise. TODOs: * [x] add tests for newly added methods * [ ] update documentation jkbradley dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #5820 from mengxr/SPARK-5956 and squashes the following commits: 7bef88d [Xiangrui Meng] address comments 05229c3 [Xiangrui Meng] assert -> assertEquals b2927b1 [Xiangrui Meng] organize imports f14456b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5956 93e7924 [Xiangrui Meng] add tests for hasParam & copy 463ecae [Xiangrui Meng] merge master 2b954c3 [Xiangrui Meng] update Binarizer 465dd12 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5956 282a1a8 [Xiangrui Meng] fix test 819dd2d [Xiangrui Meng] merge master b642872 [Xiangrui Meng] example code runs 5a67779 [Xiangrui Meng] examples compile c76b4d1 [Xiangrui Meng] fix all unit tests 0f4fd64 [Xiangrui Meng] fix some tests 9286a22 [Xiangrui Meng] copyValues to trained models 53e0973 [Xiangrui Meng] move inheritValues to Params and rename it to copyValues 9ee004e [Xiangrui Meng] merge copy and copyWith; rename validate to validateParams d882afc [Xiangrui Meng] test compile f082a31 [Xiangrui Meng] make Params copyable and simply handling of extra params in all spark.ml components
Showing
- examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java 7 additions, 17 deletions...org/apache/spark/examples/ml/JavaDeveloperApiExample.java
- examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java 2 additions, 2 deletions...org/apache/spark/examples/ml/JavaSimpleParamsExample.java
- examples/src/main/scala/org/apache/spark/examples/ml/DecisionTreeExample.scala 2 additions, 4 deletions...la/org/apache/spark/examples/ml/DecisionTreeExample.scala
- examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala 8 additions, 14 deletions...la/org/apache/spark/examples/ml/DeveloperApiExample.scala
- examples/src/main/scala/org/apache/spark/examples/ml/GBTExample.scala 2 additions, 2 deletions.../main/scala/org/apache/spark/examples/ml/GBTExample.scala
- examples/src/main/scala/org/apache/spark/examples/ml/RandomForestExample.scala 2 additions, 4 deletions...la/org/apache/spark/examples/ml/RandomForestExample.scala
- examples/src/main/scala/org/apache/spark/examples/ml/SimpleParamsExample.scala 2 additions, 2 deletions...la/org/apache/spark/examples/ml/SimpleParamsExample.scala
- mllib/src/main/scala/org/apache/spark/ml/Estimator.scala 20 additions, 6 deletionsmllib/src/main/scala/org/apache/spark/ml/Estimator.scala
- mllib/src/main/scala/org/apache/spark/ml/Evaluator.scala 16 additions, 4 deletionsmllib/src/main/scala/org/apache/spark/ml/Evaluator.scala
- mllib/src/main/scala/org/apache/spark/ml/Model.scala 4 additions, 5 deletionsmllib/src/main/scala/org/apache/spark/ml/Model.scala
- mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala 49 additions, 57 deletionsmllib/src/main/scala/org/apache/spark/ml/Pipeline.scala
- mllib/src/main/scala/org/apache/spark/ml/Transformer.scala 30 additions, 16 deletionsmllib/src/main/scala/org/apache/spark/ml/Transformer.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala 15 additions, 34 deletions...scala/org/apache/spark/ml/classification/Classifier.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala 11 additions, 18 deletions...ache/spark/ml/classification/DecisionTreeClassifier.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala 13 additions, 20 deletions...la/org/apache/spark/ml/classification/GBTClassifier.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala 26 additions, 32 deletions...g/apache/spark/ml/classification/LogisticRegression.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala 8 additions, 25 deletions...che/spark/ml/classification/ProbabilisticClassifier.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala 12 additions, 19 deletions...ache/spark/ml/classification/RandomForestClassifier.scala
- mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala 7 additions, 10 deletions...e/spark/ml/evaluation/BinaryClassificationEvaluator.scala
- mllib/src/main/scala/org/apache/spark/ml/feature/Binarizer.scala 9 additions, 11 deletions...rc/main/scala/org/apache/spark/ml/feature/Binarizer.scala
Loading
Please register or sign in to comment