Skip to content
Snippets Groups Projects
  1. Apr 18, 2017
  2. Feb 24, 2017
    • Jeff Zhang's avatar
      [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated to python worker · 330c3e33
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      self.environment will be propagated to executor. Should set PYTHONHASHSEED as long as the python version is greater than 3.3
      
      ## How was this patch tested?
      Manually tested it.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #11211 from zjffdu/SPARK-13330.
      330c3e33
  3. Jan 25, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19307][PYSPARK] Make sure user conf is propagated to SparkContext. · 92afaa93
      Marcelo Vanzin authored
      The code was failing to propagate the user conf in the case where the
      JVM was already initialized, which happens when a user submits a
      python script via spark-submit.
      
      Tested with new unit test and by running a python script in a real cluster.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16682 from vanzin/SPARK-19307.
      92afaa93
  4. Nov 28, 2016
    • Alexander Shorin's avatar
      [SPARK-18523][PYSPARK] Make SparkContext.stop more reliable · 71352c94
      Alexander Shorin authored
      ## What changes were proposed in this pull request?
      
      This PR fixes SparkContext broken state in which it may fall if spark driver get crashed or killed by OOM.
      
      ## How was this patch tested?
      
      1. Start SparkContext;
      2. Find Spark driver process and `kill -9` it;
      3. Call `sc.stop()`;
      4. Create new SparkContext after that;
      
      Without this patch you will crash on step 3 and won't be able to do step 4 without manual reset private attibutes or IPython notebook / shell restart.
      
      Author: Alexander Shorin <kxepal@apache.org>
      
      Closes #15961 from kxepal/18523-make-spark-context-stop-more-reliable.
      71352c94
  5. Nov 22, 2016
  6. Nov 03, 2016
    • Sean Owen's avatar
      [SPARK-18138][DOCS] Document that Java 7, Python 2.6, Scala 2.10, Hadoop < 2.6... · dc4c6009
      Sean Owen authored
      [SPARK-18138][DOCS] Document that Java 7, Python 2.6, Scala 2.10, Hadoop < 2.6 are deprecated in Spark 2.1.0
      
      ## What changes were proposed in this pull request?
      
      Document that Java 7, Python 2.6, Scala 2.10, Hadoop < 2.6 are deprecated in Spark 2.1.0. This does not actually implement any of the change in SPARK-18138, just peppers the documentation with notices about it.
      
      ## How was this patch tested?
      
      Doc build
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15733 from srowen/SPARK-18138.
      dc4c6009
  7. Oct 11, 2016
    • Jeff Zhang's avatar
      [SPARK-17387][PYSPARK] Creating SparkContext() from python without spark-submit ignores user conf · 5b77e66d
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      The root cause that we would ignore SparkConf when launching JVM is that SparkConf require JVM to be created first.  https://github.com/apache/spark/blob/master/python/pyspark/conf.py#L106
      In this PR, I would defer the launching of JVM until SparkContext is created so that we can pass SparkConf to JVM correctly.
      
      ## How was this patch tested?
      
      Use the example code in the description of SPARK-17387,
      ```
      $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python
      Python 2.7.12 (default, Jul  1 2016, 15:12:24)
      [GCC 5.4.0 20160609] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> from pyspark import SparkContext
      >>> from pyspark import SparkConf
      >>> conf = SparkConf().set("spark.driver.memory", "4g")
      >>> sc = SparkContext(conf=conf)
      ```
      And verify the spark.driver.memory is correctly picked up.
      
      ```
      ...op/ -Xmx4g org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=4g pyspark-shell
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14959 from zjffdu/SPARK-17387.
      5b77e66d
  8. Sep 23, 2016
    • Holden Karau's avatar
      [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulator API on top of Accumulator V2 · 90d57542
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Move the internals of the PySpark accumulator API from the old deprecated API on top of the new accumulator API.
      
      ## How was this patch tested?
      
      The existing PySpark accumulator tests (both unit tests and doc tests at the start of accumulator.py).
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14467 from holdenk/SPARK-16861-refactor-pyspark-accumulator-api.
      Unverified
      90d57542
  9. Sep 21, 2016
    • Yanbo Liang's avatar
      [SPARK-17585][PYSPARK][CORE] PySpark SparkContext.addFile supports adding files recursively · d3b88697
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Users would like to add a directory as dependency in some cases, they can use ```SparkContext.addFile``` with argument ```recursive=true``` to recursively add all files under the directory by using Scala. But Python users can only add file not directory, we should also make it supported.
      
      ## How was this patch tested?
      Unit test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #15140 from yanboliang/spark-17585.
      d3b88697
  10. Sep 20, 2016
    • Adrian Petrescu's avatar
      [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext · 4a426ff8
      Adrian Petrescu authored
      ## What changes were proposed in this pull request?
      
      The Scala version of `SparkContext` has a handy field called `uiWebUrl` that tells you which URL the SparkUI spawned by that instance lives at. This is often very useful because the value for `spark.ui.port` in the config is only a suggestion; if that port number is taken by another Spark instance on the same machine, Spark will just keep incrementing the port until it finds a free one. So, on a machine with a lot of running PySpark instances, you often have to start trying all of them one-by-one until you find your application name.
      
      Scala users have a way around this with `uiWebUrl` but Java and Python users do not. This pull request fixes this in the most straightforward way possible, simply propagating this field through the `JavaSparkContext` and into pyspark through the Java gateway.
      
      Please let me know if any additional documentation/testing is needed.
      
      ## How was this patch tested?
      
      Existing tests were run to make sure there were no regressions, and a binary distribution was created and tested manually for the correct value of `sc.uiWebPort` in a variety of circumstances.
      
      Author: Adrian Petrescu <apetresc@gmail.com>
      
      Closes #15000 from apetresc/pyspark-uiweburl.
      Unverified
      4a426ff8
  11. Sep 14, 2016
    • Sami Jaktholm's avatar
      [SPARK-17525][PYTHON] Remove SparkContext.clearFiles() from the PySpark API as... · b5bfcddb
      Sami Jaktholm authored
      [SPARK-17525][PYTHON] Remove SparkContext.clearFiles() from the PySpark API as it was removed from the Scala API prior to Spark 2.0.0
      
      ## What changes were proposed in this pull request?
      
      This pull request removes the SparkContext.clearFiles() method from the PySpark API as the method was removed from the Scala API in 8ce645d4. Using that method in PySpark leads to an exception as PySpark tries to call the non-existent method on the JVM side.
      
      ## How was this patch tested?
      
      Existing tests (though none of them tested this particular method).
      
      Author: Sami Jaktholm <sjakthol@outlook.com>
      
      Closes #15081 from sjakthol/pyspark-sc-clearfiles.
      b5bfcddb
  12. Jun 28, 2016
    • Yin Huai's avatar
      [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to... · 0923c4f5
      Yin Huai authored
      [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to the existing Scala SparkContext's SparkConf
      
      ## What changes were proposed in this pull request?
      When we create a SparkSession at the Python side, it is possible that a SparkContext has been created. For this case, we need to set configs of the SparkSession builder to the Scala SparkContext's SparkConf (we need to do so because conf changes on a active Python SparkContext will not be propagated to the JVM side). Otherwise, we may create a wrong SparkSession (e.g. Hive support is not enabled even if enableHiveSupport is called).
      
      ## How was this patch tested?
      New tests and manual tests.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #13931 from yhuai/SPARK-16224.
      0923c4f5
  13. Jun 24, 2016
    • peng.zhang's avatar
      [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite · f4fd7432
      peng.zhang authored
      ## What changes were proposed in this pull request?
      
      Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly.
      This pull request fixes it.
      
      ## How was this patch tested?
      Unit test
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: peng.zhang <peng.zhang@xiaomi.com>
      
      Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.
      f4fd7432
  14. May 04, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example. · cdce4e62
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.
      
      - Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
      - Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
      - Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
        - `SqlNetworkWordCount.scala`
        - `JavaSqlNetworkWordCount.java`
        - `sql_network_wordcount.py`
      
      Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
      - `simple_params_example.py`
      - `aft_survival_regression.py`
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12809 from dongjoon-hyun/SPARK-15031.
      cdce4e62
  15. Apr 09, 2016
    • Holden Karau's avatar
      [SPARK-13687][PYTHON] Cleanup PySpark parallelize temporary files · 00288ea2
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Eagerly cleanup PySpark's temporary parallelize cleanup files rather than waiting for shut down.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12233 from holdenk/SPARK-13687-cleanup-pyspark-temporary-files.
      00288ea2
  16. Jan 06, 2016
  17. Jan 05, 2016
  18. Dec 07, 2015
    • Davies Liu's avatar
      [SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler · ef3f047c
      Davies Liu authored
      Currently, the current line is not cleared by Cltr-C
      
      After this patch
      ```
      >>> asdfasdf^C
      Traceback (most recent call last):
        File "~/spark/python/pyspark/context.py", line 225, in signal_handler
          raise KeyboardInterrupt()
      KeyboardInterrupt
      ```
      
      It's still worse than 1.5 (and before).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10134 from davies/fix_cltrc.
      ef3f047c
  19. Nov 11, 2015
  20. Oct 19, 2015
  21. Oct 12, 2015
  22. Sep 11, 2015
    • Icaro Medeiros's avatar
      [PYTHON] Fixed typo in exception message · c3738667
      Icaro Medeiros authored
      Just fixing a typo in exception message, raised when attempting to pickle SparkContext.
      
      Author: Icaro Medeiros <icaro.medeiros@gmail.com>
      
      Closes #8724 from icaromedeiros/master.
      c3738667
  23. Aug 20, 2015
  24. Jul 22, 2015
    • Josh Rosen's avatar
      [SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and spark.localExecution.enabled · b217230f
      Josh Rosen authored
      Spark has an option called spark.localExecution.enabled; according to the docs:
      
      > Enables Spark to run certain jobs, such as first() or take() on the driver, without sending tasks to the cluster. This can make certain jobs execute very quickly, but may require shipping a whole partition of data to the driver.
      
      This feature ends up adding quite a bit of complexity to DAGScheduler, especially in the runLocallyWithinThread method, but as far as I know nobody uses this feature (I searched the mailing list and haven't seen any recent mentions of the configuration nor stacktraces including the runLocally method). As a step towards scheduler complexity reduction, I propose that we remove this feature and all code related to it for Spark 1.5.
      
      This pull request simply brings #7484 up to date.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7585 from rxin/remove-local-exec and squashes the following commits:
      
      84bd10e [Reynold Xin] Python fix.
      1d9739a [Reynold Xin] Merge pull request #7484 from JoshRosen/remove-localexecution
      eec39fa [Josh Rosen] Remove allowLocal(); deprecate user-facing uses of it.
      b0835dc [Josh Rosen] Remove local execution code in DAGScheduler
      8975d96 [Josh Rosen] Remove local execution tests.
      ffa8c9b [Josh Rosen] Remove documentation for configuration
      b217230f
  25. Jul 16, 2015
  26. Jun 29, 2015
    • Vladimir Vladimirov's avatar
      [SPARK-8528] Expose SparkContext.applicationId in PySpark · 492dca3a
      Vladimir Vladimirov authored
      Use case - we want to log applicationId (YARN in hour case) to request help with troubleshooting from the DevOps
      
      Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com>
      
      Closes #6936 from smartkiwi/master and squashes the following commits:
      
      870338b [Vladimir Vladimirov] this would make doctest to run in python3
      0eae619 [Vladimir Vladimirov] Scala doesn't use u'...' for unicode literals
      14d77a8 [Vladimir Vladimirov] stop using ELLIPSIS
      b4ebfc5 [Vladimir Vladimirov] addressed PR feedback - updated docstring
      223a32f [Vladimir Vladimirov] fixed test - applicationId is property that returns the string
      3221f5a [Vladimir Vladimirov] [SPARK-8528] added documentation for Scala
      2cff090 [Vladimir Vladimirov] [SPARK-8528] add applicationId property for SparkContext object in pyspark
      492dca3a
  27. Jun 17, 2015
  28. Jun 05, 2015
  29. May 21, 2015
    • Holden Karau's avatar
      [SPARK-7711] Add a startTime property to match the corresponding one in Scala · 6b18cdc1
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6275 from holdenk/SPARK-771-startTime-is-missing-from-pyspark and squashes the following commits:
      
      06662dc [Holden Karau] add mising blank line for style checks
      7a87410 [Holden Karau] add back missing newline
      7a7876b [Holden Karau] Add a startTime property to match the corresponding one in the Scala SparkContext
      6b18cdc1
  30. May 18, 2015
    • Daoyuan Wang's avatar
      [SPARK-7150] SparkContext.range() and SQLContext.range() · c2437de1
      Daoyuan Wang authored
      This PR is based on #6081, thanks adrian-wang.
      
      Closes #6081
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6230 from davies/range and squashes the following commits:
      
      d3ce5fe [Davies Liu] add tests
      789eda5 [Davies Liu] add range() in Python
      4590208 [Davies Liu] Merge commit 'refs/pull/6081/head' of github.com:apache/spark into range
      cbf5200 [Daoyuan Wang] let's add python support in a separate PR
      f45e3b2 [Daoyuan Wang] remove redundant toLong
      617da76 [Daoyuan Wang] fix safe marge for corner cases
      867c417 [Daoyuan Wang] fix
      13dbe84 [Daoyuan Wang] update
      bd998ba [Daoyuan Wang] update comments
      d3a0c1b [Daoyuan Wang] add range api()
      c2437de1
    • Davies Liu's avatar
      [SPARK-6216] [PYSPARK] check python version of worker with driver · 32fbd297
      Davies Liu authored
      This PR revert #5404, change to pass the version of python in driver into JVM, check it in worker before deserializing closure, then it can works with different major version of Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6203 from davies/py_version and squashes the following commits:
      
      b8fb76e [Davies Liu] fix test
      6ce5096 [Davies Liu] use string for version
      47c6278 [Davies Liu] check python version of worker with driver
      32fbd297
  31. May 01, 2015
    • Holden Karau's avatar
      [SPARK-3444] Provide an easy way to change log level · ae98eec7
      Holden Karau authored
      Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from pwendel & davies
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #5791 from holdenk/SPARK-3444-provide-an-easy-way-to-change-log-level-r2 and squashes the following commits:
      
      3bf3be9 [Holden Karau] fix exception
      42ba873 [Holden Karau] fix exception
      9117244 [Holden Karau] Only allow valid log levels, throw exception if invalid log level.
      338d7bf [Holden Karau] rename setLoggingLevel to setLogLevel
      fac14a0 [Holden Karau] Fix style errors
      d9d03f3 [Holden Karau] Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from @pwendel & @davies
      ae98eec7
  32. Apr 21, 2015
    • Davies Liu's avatar
      [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression · ab9128fb
      Davies Liu authored
      This PR enable auto_convert in JavaGateway, then we could register a converter for a given types, for example, date and datetime.
      
      There are two bugs related to auto_convert, see [1] and [2], we workaround it in this PR.
      
      [1]  https://github.com/bartdag/py4j/issues/160
      [2] https://github.com/bartdag/py4j/issues/161
      
      cc rxin JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5570 from davies/py4j_date and squashes the following commits:
      
      eb4fa53 [Davies Liu] fix tests in python 3
      d17d634 [Davies Liu] rollback changes in mllib
      2e7566d [Davies Liu] convert tuple into ArrayList
      ceb3779 [Davies Liu] Update rdd.py
      3c373f3 [Davies Liu] support date and datetime by auto_convert
      cb094ff [Davies Liu] enable auto convert
      ab9128fb
  33. Apr 20, 2015
  34. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  35. Mar 09, 2015
    • Davies Liu's avatar
      [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect() · 8767565c
      Davies Liu authored
      Because circular reference between JavaObject and JavaMember, an Java object can not be released until Python GC kick in, then it will cause memory leak in collect(), which may consume lots of memory in JVM.
      
      This PR change the way we sending collected data back into Python from local file to socket, which could avoid any disk IO during collect, also avoid any referrers of Java object in Python.
      
      cc JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4923 from davies/fix_collect and squashes the following commits:
      
      d730286 [Davies Liu] address comments
      24c92a4 [Davies Liu] fix style
      ba54614 [Davies Liu] use socket to transfer data from JVM
      9517c8f [Davies Liu] fix memory leak in collect()
      8767565c
  36. Feb 17, 2015
    • Davies Liu's avatar
      [SPARK-4172] [PySpark] Progress API in Python · 445a755b
      Davies Liu authored
      This patch bring the pull based progress API into Python, also a example in Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3027 from davies/progress_api and squashes the following commits:
      
      b1ba984 [Davies Liu] fix style
      d3b9253 [Davies Liu] add tests, mute the exception after stop
      4297327 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      969fa9d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      25590c9 [Davies Liu] update with Java API
      360de2d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      c0f1021 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      023afb3 [Davies Liu] add Python API and example for progress API
      445a755b
    • Davies Liu's avatar
      [SPARK-5864] [PySpark] support .jar as python package · fc4eb950
      Davies Liu authored
      A jar file containing Python sources in it could be used as a Python package, just like zip file.
      
      spark-submit already put the jar file into PYTHONPATH, this patch also put it in the sys.path, then it could be used in Python worker.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4652 from davies/jar and squashes the following commits:
      
      17d3f76 [Davies Liu] support .jar as python package
      fc4eb950
  37. Feb 02, 2015
  38. Jan 28, 2015
    • Yandu Oppacher's avatar
      [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible · 3bead67d
      Yandu Oppacher authored
      This PR is based on #3255 , fix conflicts and code style.
      
      Closes #3255.
      
      Author: Yandu Oppacher <yandu.oppacher@jadedpixel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3901 from davies/refactor-python-profile-code and squashes the following commits:
      
      b4a9306 [Davies Liu] fix tests
      4b79ce8 [Davies Liu] add docstring for profiler_cls
      2700e47 [Davies Liu] use BasicProfiler as default
      349e341 [Davies Liu] more refactor
      6a5d4df [Davies Liu] refactor and fix tests
      31bf6b6 [Davies Liu] fix code style
      0864b5d [Yandu Oppacher] Remove unused method
      76a6c37 [Yandu Oppacher] Added a profile collector to accumulate the profilers per stage
      9eefc36 [Yandu Oppacher] Fix doc
      9ace076 [Yandu Oppacher] Refactor of profiler, and moved tests around
      8739aff [Yandu Oppacher] Code review fixes
      9bda3ec [Yandu Oppacher] Refactor profiler code
      3bead67d
Loading