Skip to content
Snippets Groups Projects
  1. Sep 17, 2015
  2. Jul 06, 2015
    • Xiangrui Meng's avatar
      Revert "[SPARK-7212] [MLLIB] Add sequence learning flag" · 96c5eeec
      Xiangrui Meng authored
      This reverts commit 25f574eb. After speaking to some users and developers, we realized that FP-growth doesn't meet the requirement for frequent sequence mining. PrefixSpan (SPARK-6487) would be the correct algorithm for it. feynmanliang
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7240 from mengxr/SPARK-7212.revert and squashes the following commits:
      
      2b3d66b [Xiangrui Meng] Revert "[SPARK-7212] [MLLIB] Add sequence learning flag"
      96c5eeec
  3. Jun 29, 2015
    • Feynman Liang's avatar
      [SPARK-7212] [MLLIB] Add sequence learning flag · 25f574eb
      Feynman Liang authored
      Support mining of ordered frequent item sequences.
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #6997 from feynmanliang/fp-sequence and squashes the following commits:
      
      7c14e15 [Feynman Liang] Improve scalatests with R code and Seq
      0d3e4b6 [Feynman Liang] Fix python test
      ce987cb [Feynman Liang] Backwards compatibility aux constructor
      34ef8f2 [Feynman Liang] Fix failing test due to reverse orderering
      f04bd50 [Feynman Liang] Naming, add ordered to FreqItemsets, test ordering using Seq
      648d4d4 [Feynman Liang] Test case for frequent item sequences
      252a36a [Feynman Liang] Add sequence learning flag
      25f574eb
  4. May 18, 2015
    • Xiangrui Meng's avatar
      [SPARK-6657] [PYSPARK] Fix doc warnings · 1ecfac6e
      Xiangrui Meng authored
      Fixed the following warnings in `make clean html` under `python/docs`:
      
      ~~~
      /Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:3: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:4: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:3: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:4: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/sql/__init__.py:docstring of pyspark.sql.DataFrame.replace:16: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:8: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:9: WARNING: Block quote ends without a blank line; unexpected unindent.
      ~~~
      
      davies
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6221 from mengxr/SPARK-6657 and squashes the following commits:
      
      e3f83fe [Xiangrui Meng] fix sql and streaming doc warnings
      2b4371e [Xiangrui Meng] fix mllib python doc warnings
      1ecfac6e
  5. Apr 22, 2015
    • Yanbo Liang's avatar
      [SPARK-6827] [MLLIB] Wrap FPGrowthModel.freqItemsets and make it consistent with Java API · f4f39981
      Yanbo Liang authored
      Make PySpark ```FPGrowthModel.freqItemsets``` consistent with Java/Scala API like ```MatrixFactorizationModel.userFeatures```
      It return a RDD with each tuple is composed of an array and a long value.
      I think it's difficult to implement namedtuples to wrap the output because items of freqItemsets can be any type with arbitrary length which is tedious to impelement corresponding SerDe function.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5614 from yanboliang/spark-6827 and squashes the following commits:
      
      da8c404 [Yanbo Liang] use namedtuple
      5532e78 [Yanbo Liang] Wrap FPGrowthModel.freqItemsets and make it consistent with Java API
      f4f39981
  6. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  7. Apr 09, 2015
    • Yanbo Liang's avatar
      [SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API · a0411aeb
      Yanbo Liang authored
      Support FPGrowth algorithm in Python API.
      Should we remove "Experimental" which were marked for FPGrowth and FPGrowthModel in Scala? jkbradley
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5213 from yanboliang/spark-6264 and squashes the following commits:
      
      ed62ead [Yanbo Liang] trigger jenkins
      8ce0359 [Yanbo Liang] fix docstring style
      544c725 [Yanbo Liang] address comments
      a2d7cf7 [Yanbo Liang] add doc for FPGrowth.train()
      dcf7d73 [Yanbo Liang] add python doc
      b18fd07 [Yanbo Liang] trigger jenkins
      2c951b8 [Yanbo Liang] fix typos
      7f62c8f [Yanbo Liang] add fpm to __init__.py
      b96206a [Yanbo Liang] Support FPGrowth algorithm in Python API
      a0411aeb
Loading