Skip to content
Snippets Groups Projects
  1. Jan 11, 2014
  2. Jan 10, 2014
  3. Jan 09, 2014
    • Andrew Or's avatar
      Fix wonky imports from merge · 372a533a
      Andrew Or authored
      372a533a
    • Andrew Or's avatar
      Defensively allocate memory from global pool · aa5002bb
      Andrew Or authored
      This is an alternative to the existing approach, which evenly distributes the
      collective shuffle memory among all running tasks. In the new approach, each
      thread requests a chunk of memory whenever its map is about to multiplicatively
      grow. If there is sufficient memory in the global pool, the thread allocates it
      and grows its map. Otherwise, it spills.
      
      A danger with the previous approach is that a new task may quickly fill up its
      map before old tasks finish spilling, potentially causing an OOM. This approach
      prevents this scenario as it favors existing tasks over new tasks; any thread
      that may step over the boundary of other threads defensively backs off and
      starts spilling.
      
      Testing through spark-perf reveals: (1) When no spills have occured, the
      performance of external sorting using this memory management approach is
      essentially the same as without external sorting. (2) When one or more spills
      have occured, the performance of external sorting is a small multiple (3x) worse
      aa5002bb
    • Andrew Or's avatar
      Merge github.com:apache/incubator-spark · d76e1f90
      Andrew Or authored
      Conflicts:
      	core/src/main/scala/org/apache/spark/SparkEnv.scala
      	streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
      d76e1f90
    • Patrick Wendell's avatar
      Minor clean-up · 7b748b83
      Patrick Wendell authored
      7b748b83
    • Patrick Wendell's avatar
      Merge pull request #353 from pwendell/ipython-simplify · 300eaa99
      Patrick Wendell authored
      Simplify and fix pyspark script.
      
      This patch removes compatibility for IPython < 1.0 but fixes the launch
      script and makes it much simpler.
      
      I tested this using the three commands in the PySpark documentation page:
      
      1. IPYTHON=1 ./pyspark
      2. IPYTHON_OPTS="notebook" ./pyspark
      3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark
      
      There are two changes:
      - We rely on PYTHONSTARTUP env var to start PySpark
      - Removed the quotes around $IPYTHON_OPTS... having quotes
        gloms them together as a single argument passed to `exec` which
        seemed to cause ipython to fail (it instead expects them as
        multiple arguments).
      300eaa99
Loading