Skip to content
Snippets Groups Projects
  1. May 23, 2015
    • Davies Liu's avatar
      [SPARK-7840] add insertInto() to Writer · be47af1b
      Davies Liu authored
      Add tests later.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6375 from davies/insertInto and squashes the following commits:
      
      826423e [Davies Liu] add insertInto() to Writer
      be47af1b
    • Davies Liu's avatar
      [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf
      Davies Liu authored
      1. ntile should take an integer as parameter.
      2. Added Python API (based on #6364)
      3. Update documentation of various DataFrame Python functions.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6374 from rxin/window-final and squashes the following commits:
      
      69004c7 [Reynold Xin] Style fix.
      288cea9 [Reynold Xin] Update documentaiton.
      7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
      66092b4 [Davies Liu] update docs
      ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
      ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
      8936ade [Davies Liu] fix maxint in python 3
      2649358 [Davies Liu] update docs
      778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
      efe3bfdf
  2. May 22, 2015
    • Xiangrui Meng's avatar
      [SPARK-7535] [.0] [MLLIB] Audit the pipeline APIs for 1.4 · 8f11c611
      Xiangrui Meng authored
      Some changes to the pipeilne APIs:
      
      1. Estimator/Transformer/ doesn’t need to extend Params since PipelineStage already does.
      1. Move Evaluator to ml.evaluation.
      1. Mention larger metric values are better.
      1. PipelineModel doc. “compiled” -> “fitted”
      1. Hide object PolynomialExpansion.
      1. Hide object VectorAssembler.
      1. Word2Vec.minCount (and other) -> group param
      1. ParamValidators -> DeveloperApi
      1. Hide MetadataUtils/SchemaUtils.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6322 from mengxr/SPARK-7535.0 and squashes the following commits:
      
      9e9c7da [Xiangrui Meng] move JavaEvaluator to ml.evaluation as well
      e179480 [Xiangrui Meng] move Evaluation to ml.evaluation in PySpark
      08ef61f [Xiangrui Meng] update pipieline APIs
      8f11c611
  3. May 21, 2015
    • Xiangrui Meng's avatar
      [SPARK-7794] [MLLIB] update RegexTokenizer default settings · f5db4b41
      Xiangrui Meng authored
      The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6330 from mengxr/SPARK-7794 and squashes the following commits:
      
      5ee7cde [Xiangrui Meng] update RegexTokenizer default settings
      f5db4b41
    • Davies Liu's avatar
      [SPARK-7783] [SQL] [PySpark] add DataFrame.rollup/cube in Python · 17791a58
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6311 from davies/rollup and squashes the following commits:
      
      0261db1 [Davies Liu] use @since
      a51ca6b [Davies Liu] Merge branch 'master' of github.com:apache/spark into rollup
      8ad5af4 [Davies Liu] Update dataframe.py
      ade3841 [Davies Liu] add DataFrame.rollup/cube in Python
      17791a58
    • Holden Karau's avatar
      [SPARK-7711] Add a startTime property to match the corresponding one in Scala · 6b18cdc1
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6275 from holdenk/SPARK-771-startTime-is-missing-from-pyspark and squashes the following commits:
      
      06662dc [Holden Karau] add mising blank line for style checks
      7a87410 [Holden Karau] add back missing newline
      7a7876b [Holden Karau] Add a startTime property to match the corresponding one in the Scala SparkContext
      6b18cdc1
    • kaka1992's avatar
      [SPARK-7394][SQL] Add Pandas style cast (astype) · 699906e5
      kaka1992 authored
      Author: kaka1992 <kaka_1992@163.com>
      
      Closes #6313 from kaka1992/astype and squashes the following commits:
      
      73dfd0b [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
      ad8feb2 [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
      4f328b7 [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
      699906e5
    • Sean Owen's avatar
      [SPARK-6416] [DOCS] RDD.fold() requires the operator to be commutative · 6e534026
      Sean Owen authored
      Document current limitation of rdd.fold.
      
      This does not resolve SPARK-6416 but just documents the issue.
      CC JoshRosen
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6231 from srowen/SPARK-6416 and squashes the following commits:
      
      9fef39f [Sean Owen] Add comment to other languages; reword to highlight the difference from non-distributed collections and to not suggest it is a bug that is to be fixed
      da40d84 [Sean Owen] Document current limitation of rdd.fold.
      6e534026
    • Davies Liu's avatar
      [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs · 8ddcb25b
      Davies Liu authored
      Add version info for public Python SQL API.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6295 from davies/versions and squashes the following commits:
      
      cfd91e6 [Davies Liu] add more version for DataFrame API
      600834d [Davies Liu] add version to SQL API docs
      8ddcb25b
  4. May 20, 2015
    • Xiangrui Meng's avatar
      [SPARK-7762] [MLLIB] set default value for outputCol · c330e52d
      Xiangrui Meng authored
      Set a default value for `outputCol` instead of forcing users to name it. This is useful for intermediate transformers in the pipeline. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6289 from mengxr/SPARK-7762 and squashes the following commits:
      
      54edebc [Xiangrui Meng] merge master
      bff8667 [Xiangrui Meng] update unit test
      171246b [Xiangrui Meng] add unit test for outputCol
      a4321bd [Xiangrui Meng] set default value for outputCol
      c330e52d
    • Holden Karau's avatar
      [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42... · 191ee474
      Holden Karau authored
      [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6139 from holdenk/SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random and squashes the following commits:
      
      591f8e5 [Holden Karau] specify old seed for doc tests
      2470004 [Holden Karau] Fix a bunch of seeds with default values to have None as the default which will then result in using the hash of the class name
      cbad96d [Holden Karau] Add the setParams function that is used in the real code
      423b8d7 [Holden Karau] Switch the test code to behave slightly more like production code. also don't check the param map value only check for key existence
      140d25d [Holden Karau] remove extra space
      926165a [Holden Karau] Add some missing newlines for pep8 style
      8616751 [Holden Karau] merge in master
      58532e6 [Holden Karau] its the __name__ method, also treat None values as not set
      56ef24a [Holden Karau] fix test and regenerate base
      afdaa5c [Holden Karau] make sure different classes have different results
      68eb528 [Holden Karau] switch default seed to hash of type of self
      89c4611 [Holden Karau] Merge branch 'master' into SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random
      31cd96f [Holden Karau] specify the seed to randomforestregressor test
      e1b947f [Holden Karau] Style fixes
      ce90ec8 [Holden Karau] merge in master
      bcdf3c9 [Holden Karau] update docstring seeds to none and some other default seeds from 42
      65eba21 [Holden Karau] pep8 fixes
      0e3797e [Holden Karau] Make seed default to random in more places
      213a543 [Holden Karau] Simplify the generated code to only include set default if there is a default rather than having None is note None in the generated code
      1ff17c2 [Holden Karau] Make the seed random for HasSeed in python
      191ee474
    • Yanbo Liang's avatar
      [SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib · 98a46f9d
      Yanbo Liang authored
      Add MultilabelMetrics in PySpark/MLlib
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6276 from yanboliang/spark-6094 and squashes the following commits:
      
      b8e3343 [Yanbo Liang] Add MultilabelMetrics in PySpark/MLlib
      98a46f9d
  5. May 19, 2015
    • Davies Liu's avatar
      [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python · 4de74d26
      Davies Liu authored
      cc rxin, please take a quick look, I'm working on tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6238 from davies/readwrite and squashes the following commits:
      
      c7200eb [Davies Liu] update tests
      9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
      f0c5a04 [Davies Liu] use sqlContext.read.load
      5f68bc8 [Davies Liu] update tests
      6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
      bcc6668 [Davies Liu] add reader amd writer API in Python
      4de74d26
  6. May 18, 2015
    • Daoyuan Wang's avatar
      [SPARK-7150] SparkContext.range() and SQLContext.range() · c2437de1
      Daoyuan Wang authored
      This PR is based on #6081, thanks adrian-wang.
      
      Closes #6081
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6230 from davies/range and squashes the following commits:
      
      d3ce5fe [Davies Liu] add tests
      789eda5 [Davies Liu] add range() in Python
      4590208 [Davies Liu] Merge commit 'refs/pull/6081/head' of github.com:apache/spark into range
      cbf5200 [Daoyuan Wang] let's add python support in a separate PR
      f45e3b2 [Daoyuan Wang] remove redundant toLong
      617da76 [Daoyuan Wang] fix safe marge for corner cases
      867c417 [Daoyuan Wang] fix
      13dbe84 [Daoyuan Wang] update
      bd998ba [Daoyuan Wang] update comments
      d3a0c1b [Daoyuan Wang] add range api()
      c2437de1
    • Davies Liu's avatar
      [SPARK-6216] [PYSPARK] check python version of worker with driver · 32fbd297
      Davies Liu authored
      This PR revert #5404, change to pass the version of python in driver into JVM, check it in worker before deserializing closure, then it can works with different major version of Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6203 from davies/py_version and squashes the following commits:
      
      b8fb76e [Davies Liu] fix test
      6ce5096 [Davies Liu] use string for version
      47c6278 [Davies Liu] check python version of worker with driver
      32fbd297
    • Xiangrui Meng's avatar
      [SPARK-7380] [MLLIB] pipeline stages should be copyable in Python · 9c7e802a
      Xiangrui Meng authored
      This PR makes pipeline stages in Python copyable and hence simplifies some implementations. It also includes the following changes:
      
      1. Rename `paramMap` and `defaultParamMap` to `_paramMap` and `_defaultParamMap`, respectively.
      2. Accept a list of param maps in `fit`.
      3. Use parent uid and name to identify param.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6088 from mengxr/SPARK-7380 and squashes the following commits:
      
      413c463 [Xiangrui Meng] remove unnecessary doc
      4159f35 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
      611c719 [Xiangrui Meng] fix python style
      68862b8 [Xiangrui Meng] update _java_obj initialization
      927ad19 [Xiangrui Meng] fix ml/tests.py
      0138fc3 [Xiangrui Meng] update feature transformers and fix a bug in RegexTokenizer
      9ca44fb [Xiangrui Meng] simplify Java wrappers and add tests
      c7d84ef [Xiangrui Meng] update ml/tests.py to test copy params
      7e0d27f [Xiangrui Meng] merge master
      46840fb [Xiangrui Meng] update wrappers
      b6db1ed [Xiangrui Meng] update all self.paramMap to self._paramMap
      46cb6ed [Xiangrui Meng] merge master
      a163413 [Xiangrui Meng] fix style
      1042e80 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7380
      9630eae [Xiangrui Meng] fix Identifiable._randomUID
      13bd70a [Xiangrui Meng] update ml/tests.py
      64a536c [Xiangrui Meng] use _fit/_transform/_evaluate to simplify the impl
      02abf13 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into copyable-python
      66ce18c [Joseph K. Bradley] some cleanups before sending to Xiangrui
      7431272 [Joseph K. Bradley] Rebased with master
      9c7e802a
    • Xiangrui Meng's avatar
      [SPARK-6657] [PYSPARK] Fix doc warnings · 1ecfac6e
      Xiangrui Meng authored
      Fixed the following warnings in `make clean html` under `python/docs`:
      
      ~~~
      /Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:3: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:4: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:3: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:4: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/sql/__init__.py:docstring of pyspark.sql.DataFrame.replace:16: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:8: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:9: WARNING: Block quote ends without a blank line; unexpected unindent.
      ~~~
      
      davies
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6221 from mengxr/SPARK-6657 and squashes the following commits:
      
      e3f83fe [Xiangrui Meng] fix sql and streaming doc warnings
      2b4371e [Xiangrui Meng] fix mllib python doc warnings
      1ecfac6e
  7. May 15, 2015
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
    • Davies Liu's avatar
      [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python · adfd3668
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6206 from davies/sql_type and squashes the following commits:
      
      33d6860 [Davies Liu] [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python
      adfd3668
    • FlytxtRnD's avatar
      [SPARK-7651] [MLLIB] [PYSPARK] GMM predict, predictSoft should raise error on bad input · 8f4aaba0
      FlytxtRnD authored
      In the Python API for Gaussian Mixture Model, predict() and predictSoft() methods should raise an error when the input argument is not an RDD.
      
      Author: FlytxtRnD <meethu.mathew@flytxt.com>
      
      Closes #6180 from FlytxtRnD/GmmPredictException and squashes the following commits:
      
      4b6aa11 [FlytxtRnD] Raise error if the input to predict()/predictSoft() is not an RDD
      8f4aaba0
    • Yanbo Liang's avatar
      [SPARK-6258] [MLLIB] GaussianMixture Python API parity check · 94761485
      Yanbo Liang authored
      Implement Python API for major disparities of GaussianMixture cluster algorithm between Scala & Python
      ```scala
      GaussianMixture
          setInitialModel
      GaussianMixtureModel
          k
      ```
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6087 from yanboliang/spark-6258 and squashes the following commits:
      
      b3af21c [Yanbo Liang] fix typo
      2b645c1 [Yanbo Liang] fix doc
      638b4b7 [Yanbo Liang] address comments
      b5bcade [Yanbo Liang] GaussianMixture Python API parity check
      94761485
  8. May 14, 2015
    • Michael Armbrust's avatar
      [SPARK-7548] [SQL] Add explode function for DataFrames · 6d0633e3
      Michael Armbrust authored
      Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions.   There are currently the following restrictions:
       - only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
       - only one may be present in a single select to avoid potentially confusing implicit Cartesian products.
      
      TODO:
       - [ ] Python
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6107 from marmbrus/explodeFunction and squashes the following commits:
      
      7ee2c87 [Michael Armbrust] whitespace
      6f80ba3 [Michael Armbrust] Update dataframe.py
      c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      81b5da3 [Michael Armbrust] style
      d3faa05 [Michael Armbrust] fix self join case
      f9e1e3e [Michael Armbrust] fix python, add since
      4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      e710fe4 [Michael Armbrust] add java and python
      52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.
      6d0633e3
    • Xiangrui Meng's avatar
      [SPARK-7619] [PYTHON] fix docstring signature · 48fc38f5
      Xiangrui Meng authored
      Just realized that we need `\` at the end of the docstring. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6161 from mengxr/SPARK-7619 and squashes the following commits:
      
      e44495f [Xiangrui Meng] fix docstring signature
      48fc38f5
    • Xiangrui Meng's avatar
      [SPARK-7648] [MLLIB] Add weights and intercept to GLM wrappers in spark.ml · 723853ed
      Xiangrui Meng authored
      Otherwise, users can only use `transform` on the models. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6156 from mengxr/SPARK-7647 and squashes the following commits:
      
      1ae3d2d [Xiangrui Meng] add weights and intercept to LogisticRegression in Python
      f49eb46 [Xiangrui Meng] add weights and intercept to LinearRegressionModel
      723853ed
    • ksonj's avatar
      [SPARK-7278] [PySpark] DateType should find datetime.datetime acceptable · 5d7d4f88
      ksonj authored
      DateType should not be restricted to `datetime.date` but accept `datetime.datetime` objects as well. Could someone with a little more insight verify this?
      
      Author: ksonj <kson@siberie.de>
      
      Closes #6057 from ksonj/dates and squashes the following commits:
      
      68a158e [ksonj] DateType should find datetime.datetime acceptable too
      5d7d4f88
  9. May 13, 2015
  10. May 12, 2015
    • Reynold Xin's avatar
      [SPARK-7321][SQL] Add Column expression for conditional statements (when/otherwise) · 97dee313
      Reynold Xin authored
      This builds on https://github.com/apache/spark/pull/5932 and should close https://github.com/apache/spark/pull/5932 as well.
      
      As an example:
      ```python
      df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).collect()
      ```
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: kaka1992 <kaka_1992@163.com>
      
      Closes #6072 from rxin/when-expr and squashes the following commits:
      
      8f49201 [Reynold Xin] Throw exception if otherwise is applied twice.
      0455eda [Reynold Xin] Reset run-tests.
      bfb9d9f [Reynold Xin] Updated documentation and test cases.
      762f6a5 [Reynold Xin] Merge pull request #5932 from kaka1992/IFCASE
      95724c6 [kaka1992] Update
      8218d0a [kaka1992] Update
      801009e [kaka1992] Update
      76d6346 [kaka1992] [SPARK-7321][SQL] Add Column expression for conditional statements (if, case)
      97dee313
    • Xiangrui Meng's avatar
      [SPARK-7572] [MLLIB] do not import Param/Params under pyspark.ml · 77f64c73
      Xiangrui Meng authored
      Remove `Param` and `Params` from `pyspark.ml` and add a section in the doc. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6094 from mengxr/SPARK-7572 and squashes the following commits:
      
      022abd6 [Xiangrui Meng] do not import Param/Params under spark.ml
      77f64c73
    • Burak Yavuz's avatar
      [SPARK-7487] [ML] Feature Parity in PySpark for ml.regression · 8e935b0a
      Burak Yavuz authored
      Added LinearRegression Python API
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #6016 from brkyvz/ml-reg and squashes the following commits:
      
      11c9ef9 [Burak Yavuz] address comments
      1027a40 [Burak Yavuz] fix typo
      4c699ad [Burak Yavuz] added tree regressor api
      8afead2 [Burak Yavuz] made mixin for DT
      fa51c74 [Burak Yavuz] save additions
      0640d48 [Burak Yavuz] added ml.regression
      82aac48 [Burak Yavuz] added linear regression
      8e935b0a
    • Daoyuan Wang's avatar
      [SPARK-6876] [PySpark] [SQL] add DataFrame na.replace in pyspark · d86ce845
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6003 from adrian-wang/pynareplace and squashes the following commits:
      
      672efba [Daoyuan Wang] remove py2.7 feature
      4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
      9e232e7 [Daoyuan Wang] rename scala map
      af0268a [Daoyuan Wang] remove na
      63ac579 [Daoyuan Wang] add na.replace in pyspark
      d86ce845
  11. May 11, 2015
  12. May 10, 2015
    • Glenn Weidner's avatar
      [SPARK-7427] [PYSPARK] Make sharedParams match in Scala, Python · c5aca0c2
      Glenn Weidner authored
      Modified 2 files:
      python/pyspark/ml/param/_shared_params_code_gen.py
      python/pyspark/ml/param/shared.py
      
      Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6.
      python _shared_params_code_gen.py > shared.py
      
      Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala.  Note warning was displayed when committing shared.py:
      warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py.
      
      Author: Glenn Weidner <gweidner@us.ibm.com>
      
      Closes #6023 from gweidner/br-7427 and squashes the following commits:
      
      db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
      825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
      e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
      1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
      1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
      cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
      9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
      4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
      9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
      d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public
      c5aca0c2
    • Joseph K. Bradley's avatar
      [SPARK-7431] [ML] [PYTHON] Made CrossValidatorModel call parent init in PySpark · 3038443e
      Joseph K. Bradley authored
      Fixes bug with PySpark cvModel not having UID
      Also made small PySpark fixes: Evaluator should inherit from Params.  MockModel should inherit from Model.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #5968 from jkbradley/pyspark-cv-uid and squashes the following commits:
      
      57f13cd [Joseph K. Bradley] Made CrossValidatorModel call parent init in PySpark
      3038443e
    • Yanbo Liang's avatar
      [SPARK-6091] [MLLIB] Add MulticlassMetrics in PySpark/MLlib · bf7e81a5
      Yanbo Liang authored
      https://issues.apache.org/jira/browse/SPARK-6091
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6011 from yanboliang/spark-6091 and squashes the following commits:
      
      bb3e4ba [Yanbo Liang] trigger jenkins
      53c045d [Yanbo Liang] keep compatibility for python 2.6
      972d5ac [Yanbo Liang] Add MulticlassMetrics in PySpark/MLlib
      bf7e81a5
  13. May 09, 2015
    • Vinod K C's avatar
      [SPARK-7438] [SPARK CORE] Fixed validation of relativeSD in countApproxDistinct · dda6d9f4
      Vinod K C authored
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #5974 from vinodkc/fix_countApproxDistinct_Validation and squashes the following commits:
      
      3a3d59c [Vinod K C] Reverted removal of validation relativeSD<0.000017
      799976e [Vinod K C] Removed testcase to assert IAE when relativeSD>3.7
      8ddbfae [Vinod K C] Remove blank line
      b1b00a3 [Vinod K C] Removed relativeSD validation from python API,RDD.scala will do validation
      122d378 [Vinod K C] Fixed validation of relativeSD in  countApproxDistinct
      dda6d9f4
Loading