Skip to content
Snippets Groups Projects
  1. Jan 19, 2016
  2. Nov 02, 2015
  3. Aug 28, 2015
    • noelsmith's avatar
      [SPARK-10188] [PYSPARK] Pyspark CrossValidator with RMSE selects incorrect model · 7583681e
      noelsmith authored
      * Added isLargerBetter() method to Pyspark Evaluator to match the Scala version.
      * JavaEvaluator delegates isLargerBetter() to underlying Scala object.
      * Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax.
      * Added test cases for where smaller is better (RMSE) and larger is better (R-Squared).
      
      (This contribution is my original work and that I license the work to the project under Sparks' open source license)
      
      Author: noelsmith <mail@noelsmith.com>
      
      Closes #8399 from noel-smith/pyspark-rmse-xval-fix.
      7583681e
  4. Aug 19, 2015
    • Feynman Liang's avatar
      [SPARK-10097] Adds `shouldMaximize` flag to `ml.evaluation.Evaluator` · 28a98464
      Feynman Liang authored
      Previously, users of evaluator (`CrossValidator` and `TrainValidationSplit`) would only maximize the metric in evaluator, leading to a hacky solution which negated metrics to be minimized and caused erroneous negative values to be reported to the user.
      
      This PR adds a `isLargerBetter` attribute to the `Evaluator` base class, instructing users of `Evaluator` on whether the chosen metric should be maximized or minimized.
      
      CC jkbradley
      
      Author: Feynman Liang <fliang@databricks.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8290 from feynmanliang/SPARK-10097.
      28a98464
  5. Aug 14, 2015
  6. Aug 12, 2015
  7. Jul 31, 2015
    • Ram Sriharsha's avatar
      [SPARK-7690] [ML] Multiclass classification Evaluator · 4e5919bf
      Ram Sriharsha authored
      Multiclass Classification Evaluator for ML Pipelines. F1 score, precision, recall, weighted precision and weighted recall are supported as available metrics.
      
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #7475 from harsha2010/SPARK-7690 and squashes the following commits:
      
      9bf4ec7 [Ram Sriharsha] fix indentation
      3f09a85 [Ram Sriharsha] cleanup doc
      16115ae [Ram Sriharsha] code review fixes
      032d2a3 [Ram Sriharsha] fix test
      eec9865 [Ram Sriharsha] Fix Python Indentation
      1dbeffd [Ram Sriharsha] Merge branch 'master' into SPARK-7690
      68cea85 [Ram Sriharsha] Merge branch 'master' into SPARK-7690
      54c03de [Ram Sriharsha] [SPARK-7690][ml][WIP] Multiclass Evaluator for ML Pipeline
      4e5919bf
  8. Jun 20, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to... · 0b899516
      Liang-Chi Hsieh authored
      [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-8468
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6905 from viirya/cv_min and squashes the following commits:
      
      930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
      d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
      16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
      c3dd8d9 [Liang-Chi Hsieh] For comments.
      b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.
      0b899516
  9. May 28, 2015
    • Xiangrui Meng's avatar
      [MINOR] fix RegressionEvaluator doc · 834e6995
      Xiangrui Meng authored
      `make clean html` under `python/doc` returns
      ~~~
      /Users/meng/src/spark/python/pyspark/ml/evaluation.py:docstring of pyspark.ml.evaluation.RegressionEvaluator.setParams:3: WARNING: Definition list ends without a blank line; unexpected unindent.
      ~~~
      
      harsha2010
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6469 from mengxr/fix-regression-evaluator-doc and squashes the following commits:
      
      91e2dad [Xiangrui Meng] fix RegressionEvaluator doc
      834e6995
  10. May 24, 2015
    • Ram Sriharsha's avatar
      [SPARK-7833] [ML] Add python wrapper for RegressionEvaluator · 65c696ec
      Ram Sriharsha authored
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6365 from harsha2010/SPARK-7833 and squashes the following commits:
      
      923f288 [Ram Sriharsha] cleanup
      7623b7d [Ram Sriharsha] python style fix
      9743f83 [Ram Sriharsha] [SPARK-7833][ml] Add python wrapper for RegressionEvaluator
      65c696ec
  11. May 22, 2015
    • Xiangrui Meng's avatar
      [SPARK-7535] [.0] [MLLIB] Audit the pipeline APIs for 1.4 · 8f11c611
      Xiangrui Meng authored
      Some changes to the pipeilne APIs:
      
      1. Estimator/Transformer/ doesn’t need to extend Params since PipelineStage already does.
      1. Move Evaluator to ml.evaluation.
      1. Mention larger metric values are better.
      1. PipelineModel doc. “compiled” -> “fitted”
      1. Hide object PolynomialExpansion.
      1. Hide object VectorAssembler.
      1. Word2Vec.minCount (and other) -> group param
      1. ParamValidators -> DeveloperApi
      1. Hide MetadataUtils/SchemaUtils.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6322 from mengxr/SPARK-7535.0 and squashes the following commits:
      
      9e9c7da [Xiangrui Meng] move JavaEvaluator to ml.evaluation as well
      e179480 [Xiangrui Meng] move Evaluation to ml.evaluation in PySpark
      08ef61f [Xiangrui Meng] update pipieline APIs
      8f11c611
  12. May 18, 2015
    • Xiangrui Meng's avatar
      [SPARK-7380] [MLLIB] pipeline stages should be copyable in Python · 9c7e802a
      Xiangrui Meng authored
      This PR makes pipeline stages in Python copyable and hence simplifies some implementations. It also includes the following changes:
      
      1. Rename `paramMap` and `defaultParamMap` to `_paramMap` and `_defaultParamMap`, respectively.
      2. Accept a list of param maps in `fit`.
      3. Use parent uid and name to identify param.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6088 from mengxr/SPARK-7380 and squashes the following commits:
      
      413c463 [Xiangrui Meng] remove unnecessary doc
      4159f35 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
      611c719 [Xiangrui Meng] fix python style
      68862b8 [Xiangrui Meng] update _java_obj initialization
      927ad19 [Xiangrui Meng] fix ml/tests.py
      0138fc3 [Xiangrui Meng] update feature transformers and fix a bug in RegexTokenizer
      9ca44fb [Xiangrui Meng] simplify Java wrappers and add tests
      c7d84ef [Xiangrui Meng] update ml/tests.py to test copy params
      7e0d27f [Xiangrui Meng] merge master
      46840fb [Xiangrui Meng] update wrappers
      b6db1ed [Xiangrui Meng] update all self.paramMap to self._paramMap
      46cb6ed [Xiangrui Meng] merge master
      a163413 [Xiangrui Meng] fix style
      1042e80 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
      9630eae [Xiangrui Meng] fix Identifiable._randomUID
      13bd70a [Xiangrui Meng] update ml/tests.py
      64a536c [Xiangrui Meng] use _fit/_transform/_evaluate to simplify the impl
      02abf13 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into copyable-python
      66ce18c [Joseph K. Bradley] some cleanups before sending to Xiangrui
      7431272 [Joseph K. Bradley] Rebased with master
      9c7e802a
  13. May 05, 2015
    • Xiangrui Meng's avatar
      [SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark · ee374e89
      Xiangrui Meng authored
      This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:
      
      25d7451 [Xiangrui Meng] fix tests in python 3
      babdde7 [Xiangrui Meng] fix doc
      cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
      ee374e89
Loading