Skip to content
Snippets Groups Projects
  1. Aug 10, 2014
    • Davies Liu's avatar
      [SPARK-2898] [PySpark] fix bugs in deamon.py · 28dcbb53
      Davies Liu authored
      1. do not use signal handler for SIGCHILD, it's easy to cause deadlock
      2. handle EINTR during accept()
      3. pass errno into JVM
      4. handle EAGAIN during fork()
      
      Now, it can pass 50k tasks tests in 180 seconds.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1842 from davies/qa and squashes the following commits:
      
      f0ea451 [Davies Liu] fix lint
      03a2e8c [Davies Liu] cleanup dead children every seconds
      32cb829 [Davies Liu] fix lint
      0cd0817 [Davies Liu] fix bugs in deamon.py
      28dcbb53
  2. Aug 06, 2014
    • Nicholas Chammas's avatar
      [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically · d614967b
      Nicholas Chammas authored
      As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.
      
      Notes:
      * We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
      * I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
      * I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
      * Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1744 from nchammas/master and squashes the following commits:
      
      274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
      983d963 [nchammas] Merge pull request #5 from apache/master
      1db5314 [nchammas] Merge pull request #4 from apache/master
      0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
      bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
      6db9a44 [nchammas] Merge pull request #3 from apache/master
      7b4750e [Nicholas Chammas] merge upstream changes
      91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
      44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
      b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
      bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
      9da347f [nchammas] Merge pull request #2 from apache/master
      aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
      d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
      dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
      a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
      21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
      6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
      fe57ed0 [Nicholas Chammas] removing merge conflict backups
      9c01d4c [nchammas] Merge pull request #1 from apache/master
      9a66cb0 [Nicholas Chammas] resolving merge conflicts
      a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
      beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
      723ed39 [Nicholas Chammas] always delete the report file
      0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
      12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
      61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
      75ad552 [Nicholas Chammas] make check output style consistent
      d614967b
  3. Aug 03, 2014
    • Davies Liu's avatar
      [SPARK-1740] [PySpark] kill the python worker · 55349f9f
      Davies Liu authored
      Kill only the python worker related to cancelled tasks.
      
      The daemon will start a background thread to monitor all the opened sockets for all workers. If the socket is closed by JVM, this thread will kill the worker.
      
      When an task is cancelled, the socket to worker will be closed, then the worker will be killed by deamon.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1643 from davies/kill and squashes the following commits:
      
      8ffe9f3 [Davies Liu] kill worker by deamon, because runtime.exec() is too heavy
      46ca150 [Davies Liu] address comment
      acd751c [Davies Liu] kill the worker when task is canceled
      55349f9f
  4. Aug 01, 2014
    • Josh Rosen's avatar
      [SPARK-2764] Simplify daemon.py process structure · e8e0fd69
      Josh Rosen authored
      Curently, daemon.py forks a pool of numProcessors subprocesses, and those processes fork themselves again to create the actual Python worker processes that handle data.
      
      I think that this extra layer of indirection is unnecessary and adds a lot of complexity.  This commit attempts to remove this middle layer of subprocesses by launching the workers directly from daemon.py.
      
      See https://github.com/mesos/spark/pull/563 for the original PR that added daemon.py, where I raise some issues with the current design.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #1680 from JoshRosen/pyspark-daemon and squashes the following commits:
      
      5abbcb9 [Josh Rosen] Replace magic number: 4 -> EINTR
      5495dff [Josh Rosen] Throw IllegalStateException if worker launch fails.
      b79254d [Josh Rosen] Detect failed fork() calls; improve error logging.
      282c2c4 [Josh Rosen] Remove daemon.py exit logging, since it caused problems:
      8554536 [Josh Rosen] Fix daemon’s shutdown(); log shutdown reason.
      4e0fab8 [Josh Rosen] Remove shared-memory exit_flag; don't die on worker death.
      e9892b4 [Josh Rosen] [WIP] [SPARK-2764] Simplify daemon.py process structure.
      e8e0fd69
  5. Jul 22, 2014
    • Nicholas Chammas's avatar
      [SPARK-2470] PEP8 fixes to PySpark · 5d16d5bb
      Nicholas Chammas authored
      This pull request aims to resolve all outstanding PEP8 violations in PySpark.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1505 from nchammas/master and squashes the following commits:
      
      98171af [Nicholas Chammas] [SPARK-2470] revert PEP 8 fixes to cloudpickle
      cba7768 [Nicholas Chammas] [SPARK-2470] wrap expression list in parentheses
      e178dbe [Nicholas Chammas] [SPARK-2470] style - change position of line break
      9127d2b [Nicholas Chammas] [SPARK-2470] wrap expression lists in parentheses
      22132a4 [Nicholas Chammas] [SPARK-2470] wrap conditionals in parentheses
      24639bc [Nicholas Chammas] [SPARK-2470] fix whitespace for doctest
      7d557b7 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to tests.py
      8f8e4c0 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to storagelevel.py
      b3b96cf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to statcounter.py
      d644477 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to worker.py
      aa3a7b6 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to sql.py
      1916859 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to shell.py
      95d1d95 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to serializers.py
      a0fec2e [Nicholas Chammas] [SPARK-2470] PEP8 fixes to mllib
      c85e1e5 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to join.py
      d14f2f1 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to __init__.py
      81fcb20 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to resultiterable.py
      1bde265 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to java_gateway.py
      7fc849c [Nicholas Chammas] [SPARK-2470] PEP8 fixes to daemon.py
      ca2d28b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to context.py
      f4e0039 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to conf.py
      a6d5e4b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to cloudpickle.py
      f0a7ebf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to rddsampler.py
      4dd148f [nchammas] Merge pull request #5 from apache/master
      f7e4581 [Nicholas Chammas] unrelated pep8 fix
      a36eed0 [Nicholas Chammas] name ec2 instances and security groups consistently
      de7292a [nchammas] Merge pull request #4 from apache/master
      2e4fe00 [nchammas] Merge pull request #3 from apache/master
      89fde08 [nchammas] Merge pull request #2 from apache/master
      69f6e22 [Nicholas Chammas] PEP8 fixes
      2627247 [Nicholas Chammas] broke up lines before they hit 100 chars
      6544b7e [Nicholas Chammas] [SPARK-2065] give launched instances names
      69da6cf [nchammas] Merge pull request #1 from apache/master
      5d16d5bb
  6. Jun 28, 2014
    • Matthew Farrellee's avatar
      [SPARK-1394] Remove SIGCHLD handler in worker subprocess · 3c104c79
      Matthew Farrellee authored
      It should not be the responsibility of the worker subprocess, which
      does not intentionally fork, to try and cleanup child processes. Doing
      so is complex and interferes with operations such as
      platform.system().
      
      If it is desirable to have tighter control over subprocesses, then
      namespaces should be used and it should be the manager's resposibility
      to handle cleanup.
      
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #1247 from mattf/SPARK-1394 and squashes the following commits:
      
      c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker subprocess
      3c104c79
  7. May 07, 2014
    • Aaron Davidson's avatar
      SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions · 3308722c
      Aaron Davidson authored
      This patch includes several cleanups to PythonRDD, focused around fixing [SPARK-1579](https://issues.apache.org/jira/browse/SPARK-1579) cleanly. Listed in order of approximate importance:
      
      - The Python daemon waits for Spark to close the socket before exiting,
        in order to avoid causing spurious IOExceptions in Spark's
        `PythonRDD::WriterThread`.
      - Removes the Python Monitor Thread, which polled for task cancellations
        in order to kill the Python worker. Instead, we do this in the
        onCompleteCallback, since this is guaranteed to be called during
        cancellation.
      - Adds a "completed" variable to TaskContext to avoid the issue noted in
        [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), where onCompleteCallbacks may be execution-order dependent.
        Along with this, I removed the "context.interrupted = true" flag in
        the onCompleteCallback.
      - Extracts PythonRDD::WriterThread to its own class.
      
      Since this patch provides an alternative solution to [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), I did test it with
      
      ```
      sc.textFile("latlon.tsv").take(5)
      ```
      
      many times without error.
      
      Additionally, in order to test the unswallowed exceptions, I performed
      
      ```
      sc.textFile("s3n://<big file>").count()
      ```
      
      and cut my internet during execution. Prior to this patch, we got the "stdin writer exited early" message, which was unhelpful. Now, we get the SocketExceptions propagated through Spark to the user and get proper (though unsuccessful) task retries.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #640 from aarondav/pyspark-io and squashes the following commits:
      
      b391ff8 [Aaron Davidson] Detect "clean socket shutdowns" and stop waiting on the socket
      c0c49da [Aaron Davidson] SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions
      3308722c
  8. Jul 16, 2013
  9. Jul 01, 2013
  10. Jun 21, 2013
Loading