Skip to content
Snippets Groups Projects
  1. Dec 14, 2015
  2. Sep 14, 2015
    • noelsmith's avatar
      [SPARK-10273] Add @since annotation to pyspark.mllib.feature · 610971ec
      noelsmith authored
      Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings).
      
      Added since to methods + "versionadded::" to classes (derived from the git file history in pyspark).
      
      Author: noelsmith <mail@noelsmith.com>
      
      Closes #8633 from noel-smith/SPARK-10273-since-mllib-feature.
      610971ec
  3. Jul 02, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-7104] [MLLIB] Support model save/load in Python's Word2Vec · 488bad31
      Yu ISHIKAWA authored
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6821 from yu-iskw/SPARK-7104 and squashes the following commits:
      
      975136b [Yu ISHIKAWA] Organize import
      0ef58b6 [Yu ISHIKAWA] Use rmtree, instead of removedirs
      cb21653 [Yu ISHIKAWA] Add an explicit type for `Word2VecModelWrapper.save`
      1d468ef [Yu ISHIKAWA] [SPARK-7104][MLlib] Support model save/load in Python's Word2Vec
      488bad31
  4. Jun 29, 2015
    • Yanbo Liang's avatar
      [SPARK-7667] [MLLIB] MLlib Python API consistency check · f9b6bf2f
      Yanbo Liang authored
      MLlib Python API consistency check
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6856 from yanboliang/spark-7667 and squashes the following commits:
      
      21bae35 [Yanbo Liang] remove duplicate code
      eb12f95 [Yanbo Liang] fix doc inherit problem
      9e7ec3c [Yanbo Liang] address comments
      e763d32 [Yanbo Liang] MLlib Python API consistency check
      f9b6bf2f
  5. Jun 25, 2015
    • Yanbo Liang's avatar
      [MINOR] [MLLIB] rename some functions of PythonMLLibAPI · 2519dcc3
      Yanbo Liang authored
      Keep the same naming conventions for PythonMLLibAPI.
      Only the following three functions is different from others
      ```scala
      trainNaiveBayes
      trainGaussianMixture
      trainWord2Vec
      ```
      So change them to
      ```scala
      trainNaiveBayesModel
      trainGaussianMixtureModel
      trainWord2VecModel
      ```
      It does not affect any users and public APIs, only to make better understand for developer and code hacker.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7011 from yanboliang/py-mllib-api-rename and squashes the following commits:
      
      771ffec [Yanbo Liang] rename some functions of PythonMLLibAPI
      2519dcc3
  6. Jun 21, 2015
    • Yanbo Liang's avatar
      [SPARK-7604] [MLLIB] Python API for PCA and PCAModel · 32e3cdaa
      Yanbo Liang authored
      Python API for PCA and PCAModel
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6315 from yanboliang/spark-7604 and squashes the following commits:
      
      1d58734 [Yanbo Liang] remove transform() in PCAModel, use default behavior
      4d9d121 [Yanbo Liang] Python API for PCA and PCAModel
      32e3cdaa
  7. Jun 18, 2015
    • MechCoder's avatar
      [SPARK-7605] [MLLIB] [PYSPARK] Python API for ElementwiseProduct · 22732e1e
      MechCoder authored
      Python API for org.apache.spark.mllib.feature.ElementwiseProduct
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6346 from MechCoder/spark-7605 and squashes the following commits:
      
      79d1ef5 [MechCoder] Consistent and support list / array types
      5f81d81 [MechCoder] [SPARK-7605] [MLlib] Python API for ElementwiseProduct
      22732e1e
  8. May 30, 2015
    • Yanbo Liang's avatar
      [SPARK-7918] [MLLIB] MLlib Python doc parity check for evaluation and feature · 1617363f
      Yanbo Liang authored
      Check then make the MLlib Python evaluation and feature doc to be as complete as the Scala doc.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6461 from yanboliang/spark-7918 and squashes the following commits:
      
      940e3f1 [Yanbo Liang] truncate too long line and remove extra sparse
      a80ae58 [Yanbo Liang] MLlib Python doc parity check for evaluation and feature
      1617363f
  9. May 08, 2015
  10. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  11. Apr 13, 2015
    • lewuathe's avatar
      [SPARK-6643][MLLIB] Implement StandardScalerModel missing methods · fc176614
      lewuathe authored
      This is the sub-task of SPARK-6254.
      Wrap missing method for `StandardScalerModel`.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #5310 from Lewuathe/SPARK-6643 and squashes the following commits:
      
      fafd690 [lewuathe] Fix for lint-python
      bd31a64 [lewuathe] Merge branch 'master' into SPARK-6643
      578f5ee [lewuathe] Remove unnecessary class
      a38f155 [lewuathe] Merge master
      66bb2ab [lewuathe] Fix typos
      82683a0 [lewuathe] [SPARK-6643] Implement StandardScalerModel missing methods
      fc176614
  12. Apr 03, 2015
    • lewuathe's avatar
      [SPARK-6615][MLLIB] Python API for Word2Vec · 512a2f19
      lewuathe authored
      This is the sub-task of SPARK-6254.
      Wrap missing method for `Word2Vec` and `Word2VecModel`.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #5296 from Lewuathe/SPARK-6615 and squashes the following commits:
      
      f14c304 [lewuathe] Reorder tests
      1d326b9 [lewuathe] Merge master
      e2bedfb [lewuathe] Modify test cases
      afb866d [lewuathe] [SPARK-6615] Python API for Word2Vec
      512a2f19
  13. Mar 31, 2015
    • lewuathe's avatar
      [SPARK-6598][MLLIB] Python API for IDFModel · 46de6c05
      lewuathe authored
      This is the sub-task of SPARK-6254.
      Wrapping IDFModel `idf` member function for pyspark.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #5264 from Lewuathe/SPARK-6598 and squashes the following commits:
      
      1dc522c [lewuathe] [SPARK-6598] Python API for IDFModel
      46de6c05
  14. Feb 25, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT · d20559b1
      Joseph K. Bradley authored
      * Add GradientBoostedTrees Python examples to ML guide
        * I ran these in the pyspark shell, and they worked.
      * Add save/load to examples in ML guide
      * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
      
      c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
      bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide.  Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
      6d81c3e [Joseph K. Bradley] completed python GBT examples
      9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
      c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide.  Added GBT examples to ML guide
      d20559b1
  15. Dec 17, 2014
    • lewuathe's avatar
      [SPARK-4822] Use sphinx tags for Python doc annotations · 3cd51619
      lewuathe authored
      Modify python annotations for sphinx. There is no change to build process from.
      https://github.com/apache/spark/blob/master/docs/README.md
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #3685 from Lewuathe/sphinx-tag-for-pydoc and squashes the following commits:
      
      88a0fd9 [lewuathe] [SPARK-4822] Fix DevelopApi and WARN tags
      3d7a398 [lewuathe] [SPARK-4822] Use sphinx tags for Python doc annotations
      3cd51619
    • Joseph K. Bradley's avatar
      [SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc · affc3f46
      Joseph K. Bradley authored
      + small doc edit
      + include edit to make IntelliJ happy
      
      CC: davies  mengxr
      
      Note to davies  -- this does not fix the "WARNING: Literal block expected; none found." warnings since that seems to involve spacing which IntelliJ does not like.  (Those warnings occur when generating the Python docs.)
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #3669 from jkbradley/python-warnings and squashes the following commits:
      
      4587868 [Joseph K. Bradley] fixed warning
      8cb073c [Joseph K. Bradley] Updated based on davies recommendation
      c51eca4 [Joseph K. Bradley] Updated rst file for pyspark.mllib.rand doc.  Small doc edit.  Small include edit to make IntelliJ happy.
      affc3f46
  16. Dec 15, 2014
    • Yuu ISHIKAWA's avatar
      [SPARK-4494][mllib] IDFModel.transform() add support for single vector · 8098fab0
      Yuu ISHIKAWA authored
      I improved `IDFModel.transform` to allow using a single vector.
      
      [[SPARK-4494] IDFModel.transform() add support for single vector - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-4494)
      
      Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #3603 from yu-iskw/idf and squashes the following commits:
      
      256ff3d [Yuu ISHIKAWA] Fix typo
      a3bf566 [Yuu ISHIKAWA] - Fix typo - Optimize import order - Aggregate the assertion tests - Modify `IDFModel.transform` API for pyspark
      d25e49b [Yuu ISHIKAWA] Add the implementation of `IDFModel.transform` for a term frequency vector
      8098fab0
  17. Nov 13, 2014
    • Davies Liu's avatar
      [SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py · ce0333f9
      Davies Liu authored
      This PR rename random.py to rand.py to avoid the side affects of conflict with random module, but still keep the same interface as before.
      
      ```
      >>> from pyspark.mllib.random import RandomRDDs
      ```
      
      ```
      $ pydoc pyspark.mllib.random
      Help on module random in pyspark.mllib:
      NAME
          random - Python package for random data generation.
      
      FILE
          /Users/davies/work/spark/python/pyspark/mllib/rand.py
      
      CLASSES
          __builtin__.object
              pyspark.mllib.random.RandomRDDs
      
          class RandomRDDs(__builtin__.object)
           |  Generator methods for creating RDDs comprised of i.i.d samples from
           |  some distribution.
           |
           |  Static methods defined here:
           |
           |  normalRDD(sc, size, numPartitions=None, seed=None)
      ```
      
      cc mengxr
      
      reference link: http://xion.org.pl/2012/05/06/hacking-python-imports/
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3216 from davies/random and squashes the following commits:
      
      7ac4e8b [Davies Liu] rename random.py to rand.py
      ce0333f9
  18. Nov 11, 2014
    • Davies Liu's avatar
      [SPARK-4324] [PySpark] [MLlib] support numpy.array for all MLlib API · 65083e93
      Davies Liu authored
      This PR check all of the existing Python MLlib API to make sure that numpy.array is supported as Vector (also RDD of numpy.array).
      
      It also improve some docstring and doctest.
      
      cc mateiz mengxr
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3189 from davies/numpy and squashes the following commits:
      
      d5057c4 [Davies Liu] fix tests
      6987611 [Davies Liu] support numpy.array for all MLlib API
      65083e93
  19. Oct 31, 2014
    • Davies Liu's avatar
      [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API · 872fc669
      Davies Liu authored
      Create several helper functions to call MLlib Java API, convert the arguments to Java type and convert return value to Python object automatically, this simplify serialization in MLlib Python API very much.
      
      After this, the MLlib Python API does not need to deal with serialization details anymore, it's easier to add new API.
      
      cc mengxr
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #2995 from davies/cleanup and squashes the following commits:
      
      8fa6ec6 [Davies Liu] address comments
      16b85a0 [Davies Liu] Merge branch 'master' of github.com:apache/spark into cleanup
      43743e5 [Davies Liu] bugfix
      731331f [Davies Liu] simplify serialization in MLlib Python API
      872fc669
  20. Oct 28, 2014
    • Davies Liu's avatar
      [SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature · fae095bc
      Davies Liu authored
      Added completed Python API for MLlib.feature
      
      Normalizer
      StandardScalerModel
      StandardScaler
      HashTF
      IDFModel
      IDF
      
      cc mengxr
      
      Author: Davies Liu <davies@databricks.com>
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2819 from davies/feature and squashes the following commits:
      
      4f48f48 [Davies Liu] add a note for HashingTF
      67f6d21 [Davies Liu] address comments
      b628693 [Davies Liu] rollback changes in Word2Vec
      efb4f4f [Davies Liu] Merge branch 'master' into feature
      806c7c2 [Davies Liu] address comments
      3abb8c2 [Davies Liu] address comments
      59781b9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into feature
      a405ae7 [Davies Liu] fix tests
      7a1891a [Davies Liu] fix tests
      486795f [Davies Liu] update programming guide, HashTF -> HashingTF
      8a50584 [Davies Liu] Python API for mllib.feature
      fae095bc
  21. Oct 16, 2014
    • Davies Liu's avatar
      [SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work in cluster mode · 091d32c5
      Davies Liu authored
      Customized pickler should be registered before unpickling, but in executor, there is no way to register the picklers before run the tasks.
      
      So, we need to register the picklers in the tasks itself, duplicate the javaToPython() and pythonToJava() in MLlib, call SerDe.initialize() before pickling or unpickling.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2830 from davies/fix_pickle and squashes the following commits:
      
      0c85fb9 [Davies Liu] revert the privacy change
      6b94e15 [Davies Liu] use JavaConverters instead of JavaConversions
      0f02050 [Davies Liu] hotfix: Customized pickler does not work in cluster
      091d32c5
  22. Oct 11, 2014
    • cocoatomo's avatar
      [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings · 7a3f589e
      cocoatomo authored
      Sphinx documents contains a corrupted ReST format and have some warnings.
      
      The purpose of this issue is same as https://issues.apache.org/jira/browse/SPARK-3773.
      
      commit: 0e8203f4
      
      output
      ```
      $ cd ./python/docs
      $ make clean html
      rm -rf _build/*
      sphinx-build -b html -d _build/doctrees   . _build/html
      Making output directory...
      Running Sphinx v1.2.3
      loading pickled environment... not yet created
      building [html]: targets for 4 source files that are out of date
      updating environment: 4 added, 0 changed, 0 removed
      reading sources... [100%] pyspark.sql
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of pyspark.mllib.feature.Word2VecModel.findSynonyms:4: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of pyspark.mllib.feature.Word2VecModel.transform:3: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/sql.py:docstring of pyspark.sql:4: WARNING: Bullet list ends without a blank line; unexpected unindent.
      looking for now-outdated files... none found
      pickling environment... done
      checking consistency... done
      preparing documents... done
      writing output... [100%] pyspark.sql
      writing additional files... (12 module code pages) _modules/index search
      copying static files... WARNING: html_static_path entry u'/Users/<user>/MyRepos/Scala/spark/python/docs/_static' does not exist
      done
      copying extra files... done
      dumping search index... done
      dumping object inventory... done
      build succeeded, 4 warnings.
      
      Build finished. The HTML pages are in _build/html.
      ```
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2766 from cocoatomo/issues/3909-sphinx-build-warnings and squashes the following commits:
      
      2c7faa8 [cocoatomo] [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings
      7a3f589e
  23. Oct 07, 2014
    • Liquan Pei's avatar
      [SPARK-3486][MLlib][PySpark] PySpark support for Word2Vec · 098c7344
      Liquan Pei authored
      mengxr
      Added PySpark support for Word2Vec
      Change list
      (1) PySpark support for Word2Vec
      (2) SerDe support of string sequence both on python side and JVM side
      (3) Test for SerDe of string sequence on JVM side
      
      Author: Liquan Pei <liquanpei@gmail.com>
      
      Closes #2356 from Ishiihara/Word2Vec-python and squashes the following commits:
      
      476ea34 [Liquan Pei] style fixes
      b13a0b9 [Liquan Pei] resolve merge conflicts and minor fixes
      8671eba [Liquan Pei] Merge remote-tracking branch 'upstream/master' into Word2Vec-python
      daf88a6 [Liquan Pei] modification according to feedback
      a73fa19 [Liquan Pei] clean up
      3d8007b [Liquan Pei] fix findSynonyms for vector
      1bdcd2e [Liquan Pei] minor fixes
      cdef9f4 [Liquan Pei] add missing comments
      b7447eb [Liquan Pei] modify according to feedback
      b9a7383 [Liquan Pei] cache words RDD in fit
      89490bf [Liquan Pei] add tests and Word2VecModelWrapper
      78bbb53 [Liquan Pei] use pickle for seq string SerDe
      a264b08 [Liquan Pei] Merge remote-tracking branch 'upstream/master' into Word2Vec-python
      ca1e5ff [Liquan Pei] fix test
      68e7276 [Liquan Pei] minor style fixes
      48d5e72 [Liquan Pei] Functionality improvement
      0ad3ac1 [Liquan Pei] minor fix
      c867fdf [Liquan Pei] add Word2Vec to pyspark
      098c7344
Loading