Skip to content
Snippets Groups Projects
  1. Sep 27, 2016
    • WeichenXu's avatar
      [SPARK-17138][ML][MLIB] Add Python API for multinomial logistic regression · 7f16affa
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      Add Python API for multinomial logistic regression.
      
      - add `family` param in python api.
      - expose `coefficientMatrix` and `interceptVector` for `LogisticRegressionModel`
      - add python-side testcase for multinomial logistic regression
      - update python doc.
      
      ## How was this patch tested?
      
      existing and added doc tests.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14852 from WeichenXu123/add_MLOR_python.
      7f16affa
  2. Sep 06, 2016
    • Yanbo Liang's avatar
      [MINOR][ML] Correct weights doc of MultilayerPerceptronClassificationModel. · 39d538dd
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      ```weights``` of ```MultilayerPerceptronClassificationModel``` should be the output weights of layers rather than initial weights, this PR correct it.
      
      ## How was this patch tested?
      Doc change.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14967 from yanboliang/mlp-weights.
      39d538dd
  3. Aug 25, 2016
  4. Aug 22, 2016
    • Holden Karau's avatar
      [SPARK-15113][PYSPARK][ML] Add missing num features num classes · b264cbb1
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Add missing `numFeatures` and `numClasses` to the wrapped Java models in PySpark ML pipelines. Also tag `DecisionTreeClassificationModel` as Expiremental to match Scala doc.
      
      ## How was this patch tested?
      
      Extended doctests
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12889 from holdenk/SPARK-15113-add-missing-numFeatures-numClasses.
      b264cbb1
  5. Aug 12, 2016
    • Yanbo Liang's avatar
      [MINOR][ML] Rename TreeEnsembleModels to TreeEnsembleModel for PySpark · ccc6dc0f
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Fix the typo of ```TreeEnsembleModels``` for PySpark, it should ```TreeEnsembleModel``` which will be consistent with Scala. What's more, it represents a tree ensemble model, so  ```TreeEnsembleModel``` should be more reasonable. This should not be used public, so it will not involve  breaking change.
      
      ## How was this patch tested?
      No new tests, should pass existing ones.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14454 from yanboliang/TreeEnsembleModel.
      ccc6dc0f
  6. Jul 25, 2016
    • WeichenXu's avatar
      [SPARK-16653][ML][OPTIMIZER] update ANN convergence tolerance param default to 1e-6 · ad3708e7
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      replace ANN convergence tolerance param default
      from 1e-4 to 1e-6
      
      so that it will be the same with other algorithms in MLLib which use LBFGS as optimizer.
      
      ## How was this patch tested?
      
      Existing Test.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14286 from WeichenXu123/update_ann_tol.
      ad3708e7
  7. Jul 19, 2016
    • Yanbo Liang's avatar
      [SPARK-16494][ML] Upgrade breeze version to 0.12 · 67089149
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      breeze 0.12 has been released for more than half a year, and it brings lots of new features, performance improvement and bug fixes.
      One of the biggest features is ```LBFGS-B``` which is an implementation of ```LBFGS``` with box constraints and much faster for some special case.
      We would like to implement Huber loss function for ```LinearRegression``` ([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it requires ```LBFGS-B``` as the optimization solver. So we should bump up the dependent breeze version to 0.12.
      For more features, improvements and bug fixes of breeze 0.12, you can refer the following link:
      https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c
      
      ## How was this patch tested?
      No new tests, should pass the existing ones.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14150 from yanboliang/spark-16494.
      67089149
  8. Jul 13, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML · 01f09b16
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      General decisions to follow, except where noted:
      * spark.mllib, pyspark.mllib: Remove all Experimental annotations.  Leave DeveloperApi annotations alone.
      * spark.ml, pyspark.ml
      ** Annotate Estimator-Model pairs of classes and companion objects the same way.
      ** For all algorithms marked Experimental with Since tag <= 1.6, remove Experimental annotation.
      ** For all algorithms marked Experimental with Since tag = 2.0, leave Experimental annotation.
      * DeveloperApi annotations are left alone, except where noted.
      * No changes to which types are sealed.
      
      Exceptions where I am leaving items Experimental in spark.ml, pyspark.ml, mainly because the items are new:
      * Model Summary classes
      * MLWriter, MLReader, MLWritable, MLReadable
      * Evaluator and subclasses: There is discussion of changes around evaluating multiple metrics at once for efficiency.
      * RFormula: Its behavior may need to change slightly to match R in edge cases.
      * AFTSurvivalRegression
      * MultilayerPerceptronClassifier
      
      DeveloperApi changes:
      * ml.tree.Node, ml.tree.Split, and subclasses should no longer be DeveloperApi
      
      ## How was this patch tested?
      
      N/A
      
      Note to reviewers:
      * spark.ml.clustering.LDA underwent significant changes (additional methods), so let me know if you want me to leave it Experimental.
      * Be careful to check for cases where a class should no longer be Experimental but has an Experimental method, val, or other feature.  I did not find such cases, but please verify.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #14147 from jkbradley/experimental-audit.
      01f09b16
  9. Jun 22, 2016
    • Nick Pentreath's avatar
      [SPARK-16127][ML][PYPSARK] Audit @Since annotations related to ml.linalg · 18faa588
      Nick Pentreath authored
      [SPARK-14615](https://issues.apache.org/jira/browse/SPARK-14615) and #12627 changed `spark.ml` pipelines to use the new `ml.linalg` classes for `Vector`/`Matrix`. Some `Since` annotations for public methods/vals have not been updated accordingly to be `2.0.0`. This PR updates them.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #13840 from MLnick/SPARK-16127-ml-linalg-since.
      18faa588
    • Holden Karau's avatar
      [SPARK-15162][SPARK-15164][PYSPARK][DOCS][ML] update some pydocs · d281b0ba
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Mark ml.classification algorithms as experimental to match Scala algorithms, update PyDoc for for thresholds on `LogisticRegression` to have same level of info as Scala, and enable mathjax for PyDoc.
      
      ## How was this patch tested?
      
      Built docs locally & PySpark SQL tests
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12938 from holdenk/SPARK-15162-SPARK-15164-update-some-pydocs.
      d281b0ba
  10. Jun 21, 2016
    • Bryan Cutler's avatar
      [SPARK-15741][PYSPARK][ML] Pyspark cleanup of set default seed to None · b76e3553
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Several places set the seed Param default value to None which will translate to a zero value on the Scala side.  This is unnecessary because a default fixed value already exists and if a test depends on a zero valued seed, then it should explicitly set it to zero instead of relying on this translation.  These cases can be safely removed except for the ALS doc test, which has been changed to set the seed value to zero.
      
      ## How was this patch tested?
      
      Ran PySpark tests locally
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13672 from BryanCutler/pyspark-cleanup-setDefault-seed-SPARK-15741.
      b76e3553
  11. Jun 20, 2016
    • Bryan Cutler's avatar
      [SPARK-16079][PYSPARK][ML] Added missing import for... · a42bf555
      Bryan Cutler authored
      [SPARK-16079][PYSPARK][ML] Added missing import for DecisionTreeRegressionModel used in GBTClassificationModel
      
      ## What changes were proposed in this pull request?
      
      Fixed missing import for DecisionTreeRegressionModel used in GBTClassificationModel trees method.
      
      ## How was this patch tested?
      
      Local tests
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13787 from BryanCutler/pyspark-GBTClassificationModel-import-SPARK-16079.
      a42bf555
  12. Jun 13, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and... · baa3e633
      Liang-Chi Hsieh authored
      [SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and ml.Matrix under spark.ml.python
      
      ## What changes were proposed in this pull request?
      
      Now we have PySpark picklers for new and old vector/matrix, individually. However, they are all implemented under `PythonMLlibAPI`. To separate spark.mllib from spark.ml, we should implement the picklers of new vector/matrix under `spark.ml.python` instead.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #13219 from viirya/pyspark-pickler-ml.
      baa3e633
  13. Jun 06, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'an -> a' · fd8af397
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `an -> a`
      
      Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13515 from zhengruifeng/an_a.
      fd8af397
  14. Jun 03, 2016
  15. Jun 02, 2016
    • Holden Karau's avatar
      [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods · 72353311
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Add `toDebugString` and `totalNumNodes` to `TreeEnsembleModels` and add `toDebugString` to `DecisionTreeModel`
      
      ## How was this patch tested?
      
      Extended doc tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12919 from holdenk/SPARK-15139-pyspark-treeEnsemble-missing-methods.
      72353311
  16. May 23, 2016
    • WeichenXu's avatar
      [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with... · a15ca553
      WeichenXu authored
      [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code
      
      ## What changes were proposed in this pull request?
      
      Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code.
      
      ## How was this patch tested?
      
      Existing test.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13242 from WeichenXu123/python_doctest_update_sparksession.
      a15ca553
  17. May 17, 2016
    • DB Tsai's avatar
      [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms · e2efe052
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12627 from dbtsai/SPARK-14615-NewML.
      e2efe052
  18. May 13, 2016
  19. May 09, 2016
    • Holden Karau's avatar
      [SPARK-15136][PYSPARK][DOC] Fix links to sphinx style and add a default param doc note · 12fe2ecd
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      PyDoc links in ml are in non-standard format. Switch to standard sphinx link format for better formatted documentation. Also add a note about default value in one place. Copy some extended docs from scala for GBT
      
      ## How was this patch tested?
      
      Built docs locally.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12918 from holdenk/SPARK-15137-linkify-pyspark-ml-classification.
      12fe2ecd
  20. May 03, 2016
    • Yanbo Liang's avatar
      [SPARK-14971][ML][PYSPARK] PySpark ML Params setter code clean up · d26f7cb0
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      PySpark ML Params setter code clean up.
      For examples,
      ```setInputCol``` can be simplified from
      ```
      self._set(inputCol=value)
      return self
      ```
      to:
      ```
      return self._set(inputCol=value)
      ```
      This is a pretty big sweeps, and we cleaned wherever possible.
      ## How was this patch tested?
      Exist unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #12749 from yanboliang/spark-14971.
      d26f7cb0
  21. May 01, 2016
    • Xusen Yin's avatar
      [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update · a6428292
      Xusen Yin authored
      ## What changes were proposed in this pull request?
      
      This PR is an update for [https://github.com/apache/spark/pull/12738] which:
      * Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side
      * Various fixes for bugs found
        * This includes changing classes taking weightCol to treat unset and empty String Param values the same way.
      
      Defaults changed:
      * Scala
       * LogisticRegression: weightCol defaults to not set (instead of empty string)
       * StringIndexer: labels default to not set (instead of empty array)
       * GeneralizedLinearRegression:
         * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver)
         * weightCol defaults to not set (instead of empty string)
       * LinearRegression: weightCol defaults to not set (instead of empty string)
      * Python
       * MultilayerPerceptron: layers default to not set (instead of [1,1])
       * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set)
      
      ## How was this patch tested?
      
      Generic unit test.  Manually tested that unit test by changing defaults and verifying that broke the test.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      Author: yinxusen <yinxusen@gmail.com>
      
      Closes #12816 from jkbradley/yinxusen-SPARK-14931.
      a6428292
  22. Apr 30, 2016
    • Herman van Hovell's avatar
      [SPARK-14952][CORE][ML] Remove methods that were deprecated in 1.6.0 · e5fb78ba
      Herman van Hovell authored
      #### What changes were proposed in this pull request?
      
      This PR removes three methods the were deprecated in 1.6.0:
      - `PortableDataStream.close()`
      - `LinearRegression.weights`
      - `LogisticRegression.weights`
      
      The rationale for doing this is that the impact is small and that Spark 2.0 is a major release.
      
      #### How was this patch tested?
      Compilation succeded.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #12732 from hvanhovell/SPARK-14952.
      e5fb78ba
  23. Apr 20, 2016
    • Burak Yavuz's avatar
      [SPARK-14555] First cut of Python API for Structured Streaming · 80bf48f4
      Burak Yavuz authored
      ## What changes were proposed in this pull request?
      
      This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:
       - ContinuousQuery
       - Trigger
       - ProcessingTime
      in pyspark under `pyspark.sql.streaming`.
      
      In addition, it contains the new methods added under:
       -  `DataFrameWriter`
           a) `startStream`
           b) `trigger`
           c) `queryName`
      
       -  `DataFrameReader`
           a) `stream`
      
       - `DataFrame`
          a) `isStreaming`
      
      This PR doesn't contain all methods exposed for `ContinuousQuery`, for example:
       - `exception`
       - `sourceStatuses`
       - `sinkStatus`
      
      They may be added in a follow up.
      
      This PR also contains some very minor doc fixes in the Scala side.
      
      ## How was this patch tested?
      
      Python doc tests
      
      TODO:
       - [ ] verify Python docs look good
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      Author: Burak Yavuz <burak@databricks.com>
      
      Closes #12320 from brkyvz/stream-python.
      80bf48f4
  24. Apr 18, 2016
  25. Apr 15, 2016
    • Xusen Yin's avatar
      [SPARK-7861][ML] PySpark OneVsRest · 90b46e01
      Xusen Yin authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-7861
      
      Add PySpark OneVsRest. I implement it with Python since it's a meta-pipeline.
      
      ## How was this patch tested?
      
      Test with doctest.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #12124 from yinxusen/SPARK-14306-7861.
      90b46e01
    • sethah's avatar
      [SPARK-14104][PYSPARK][ML] All Python param setters should use the `_set` method · 129f2f45
      sethah authored
      ## What changes were proposed in this pull request?
      
      Param setters in python previously accessed the _paramMap directly to update values. The `_set` method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to `_paramMap` besides the one in the `_set` method to ensure type checking happens.
      
      Additional changes:
      * [SPARK-13068](https://github.com/apache/spark/pull/11663) missed adding type converters in evaluation.py so those are done here
      * An incorrect `toBoolean` type converter was used for StringIndexer `handleInvalid` param in previous PR. This is fixed here.
      
      ## How was this patch tested?
      
      Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #11939 from sethah/SPARK-14104.
      129f2f45
  26. Apr 14, 2016
  27. Apr 13, 2016
    • Bryan Cutler's avatar
      [SPARK-14472][PYSPARK][ML] Cleanup ML JavaWrapper and related class hierarchy · fc3cd2f5
      Bryan Cutler authored
      Currently, JavaWrapper is only a wrapper class for pipeline classes that have Params and JavaCallable is a separate mixin that provides methods to make Java calls.  This change simplifies the class structure and to define the Java wrapper in a plain base class along with methods to make Java calls.  Also, renames Java wrapper classes to better reflect their purpose.
      
      Ran existing Python ml tests and generated documentation to test this change.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #12304 from BryanCutler/pyspark-cleanup-JavaWrapper-SPARK-14472.
      fc3cd2f5
  28. Apr 08, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14498][ML][PYTHON][SQL] Many cleanups to ML and ML-related docs · d7af736b
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Cleanups to documentation.  No changes to code.
      * GBT docs: Move Scala doc for private object GradientBoostedTrees to public docs for GBTClassifier,Regressor
      * GLM regParam: needs doc saying it is for L2 only
      * TrainValidationSplitModel: add .. versionadded:: 2.0.0
      * Rename “_transformer_params_from_java” to “_transfer_params_from_java”
      * LogReg Summary classes: “probability” col should not say “calibrated”
      * LR summaries: coefficientStandardErrors —> document that intercept stderr comes last.  Same for t,p-values
      * approxCountDistinct: Document meaning of “rsd" argument.
      * LDA: note which params are for online LDA only
      
      ## How was this patch tested?
      
      Doc build
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12266 from jkbradley/ml-doc-cleanups.
      d7af736b
    • Kai Jiang's avatar
      [SPARK-14373][PYSPARK] PySpark RandomForestClassifier, Regressor support export/import · e5d8d6e0
      Kai Jiang authored
      ## What changes were proposed in this pull request?
      supporting `RandomForest{Classifier, Regressor}` save/load for Python API.
      [JIRA](https://issues.apache.org/jira/browse/SPARK-14373)
      ## How was this patch tested?
      doctest
      
      Author: Kai Jiang <jiangkai@gmail.com>
      
      Closes #12238 from vectorijk/spark-14373.
      e5d8d6e0
  29. Apr 06, 2016
    • Bryan Cutler's avatar
      [SPARK-13430][PYSPARK][ML] Python API for training summaries of linear and logistic regression · 9c6556c5
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Adding Python API for training summaries of LogisticRegression and LinearRegression in PySpark ML.
      
      ## How was this patch tested?
      Added unit tests to exercise the api calls for the summary classes.  Also, manually verified values are expected and match those from Scala directly.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #11621 from BryanCutler/pyspark-ml-summary-SPARK-13430.
      9c6556c5
  30. Apr 01, 2016
    • Alexander Ulanov's avatar
      [SPARK-11262][ML] Unit test for gradient, loss layers, memory management for multilayer perceptron · 26867ebc
      Alexander Ulanov authored
      1.Implement LossFunction trait and implement squared error and cross entropy
      loss with it
      2.Implement unit test for gradient and loss
      3.Implement InPlace trait and in-place layer evaluation
      4.Refactor interface for ActivationFunction
      5.Update of Layer and LayerModel interfaces
      6.Fix random weights assignment
      7.Implement memory allocation by MLP model instead of individual layers
      
      These features decreased the memory usage and increased flexibility of
      internal API.
      
      Author: Alexander Ulanov <nashb@yandex.ru>
      Author: avulanov <avulanov@gmail.com>
      
      Closes #9229 from avulanov/mlp-refactoring.
      26867ebc
  31. Mar 31, 2016
    • sethah's avatar
      [SPARK-14264][PYSPARK][ML] Add feature importance for GBTs in pyspark · b11887c0
      sethah authored
      ## What changes were proposed in this pull request?
      
      Feature importances are exposed in the python API for GBTs.
      
      Other changes:
      * Update the random forest feature importance documentation to not repeat decision tree docstring and instead place a reference to it.
      
      ## How was this patch tested?
      
      Python doc tests were updated to validate GBT feature importance.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #12056 from sethah/Pyspark_GBT_feature_importance.
      b11887c0
  32. Mar 30, 2016
  33. Mar 24, 2016
    • GayathriMurali's avatar
      [SPARK-13949][ML][PYTHON] PySpark ml DecisionTreeClassifier, Regressor support export/import · 0874ff3a
      GayathriMurali authored
      ## What changes were proposed in this pull request?
      
      Added MLReadable and MLWritable to Decision Tree Classifier and Regressor. Added doctests.
      
      ## How was this patch tested?
      
      Python Unit tests. Tests added to check persistence in DecisionTreeClassifier and DecisionTreeRegressor.
      
      Author: GayathriMurali <gayathri.m.softie@gmail.com>
      
      Closes #11892 from GayathriMurali/SPARK-13949.
      0874ff3a
    • sethah's avatar
      [SPARK-14107][PYSPARK][ML] Add seed as named argument to GBTs in pyspark · 58509771
      sethah authored
      ## What changes were proposed in this pull request?
      
      GBTs in pyspark previously had seed parameters, but they could not be passed as keyword arguments through the class constructor. This patch adds seed as a keyword argument and also sets default value.
      
      ## How was this patch tested?
      
      Doc tests were updated to pass a random seed through the GBTClassifier and GBTRegressor constructors.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #11944 from sethah/SPARK-14107.
      58509771
  34. Mar 23, 2016
    • sethah's avatar
      [SPARK-13068][PYSPARK][ML] Type conversion for Pyspark params · 30bdb5cb
      sethah authored
      ## What changes were proposed in this pull request?
      
      This patch adds type conversion functionality for parameters in Pyspark. A `typeConverter` field is added to the constructor of `Param` class. This argument is a function which converts values passed to this param to the appropriate type if possible. This is beneficial so that the params can fail at set time if they are given inappropriate values, but even more so because coherent error messages are now provided when Py4J cannot cast the python type to the appropriate Java type.
      
      This patch also adds a `TypeConverters` class with factory methods for common type conversions. Most of the changes involve adding these factory type converters to existing params. The previous solution to this issue, `expectedType`, is deprecated and can be removed in 2.1.0 as discussed on the Jira.
      
      ## How was this patch tested?
      
      Unit tests were added in python/pyspark/ml/tests.py to test parameter type conversion. These tests check that values that should be convertible are converted correctly, and that the appropriate errors are thrown when invalid values are provided.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #11663 from sethah/SPARK-13068-tc.
      30bdb5cb
  35. Mar 22, 2016
    • Joseph K. Bradley's avatar
      [SPARK-13951][ML][PYTHON] Nested Pipeline persistence · 7e3423b9
      Joseph K. Bradley authored
      Adds support for saving and loading nested ML Pipelines from Python.  Pipeline and PipelineModel do not extend JavaWrapper, but they are able to utilize the JavaMLWriter, JavaMLReader implementations.
      
      Also:
      * Separates out interfaces from Java wrapper implementations for MLWritable, MLReadable, MLWriter, MLReader.
      * Moves methods _stages_java2py, _stages_py2java into Pipeline, PipelineModel as _transfer_stage_from_java, _transfer_stage_to_java
      
      Added new unit test for nested Pipelines.  Abstracted validity check into a helper method for the 2 unit tests.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #11866 from jkbradley/nested-pipeline-io.
      Closes #11835
      7e3423b9
  36. Mar 16, 2016
    • GayathriMurali's avatar
      [SPARK-13034] PySpark ml.classification support export/import · 27e1f388
      GayathriMurali authored
      ## What changes were proposed in this pull request?
      
      Add export/import for all estimators and transformers(which have Scala implementation) under pyspark/ml/classification.py.
      
      ## How was this patch tested?
      
      ./python/run-tests
      ./dev/lint-python
      Unit tests added to check persistence in Logistic Regression
      
      Author: GayathriMurali <gayathri.m.softie@gmail.com>
      
      Closes #11707 from GayathriMurali/SPARK-13034.
      27e1f388
Loading