Skip to content
Snippets Groups Projects
  1. Jan 10, 2014
  2. Jan 09, 2014
    • Patrick Wendell's avatar
      Minor clean-up · 7b748b83
      Patrick Wendell authored
      7b748b83
    • Patrick Wendell's avatar
      Merge pull request #353 from pwendell/ipython-simplify · 300eaa99
      Patrick Wendell authored
      Simplify and fix pyspark script.
      
      This patch removes compatibility for IPython < 1.0 but fixes the launch
      script and makes it much simpler.
      
      I tested this using the three commands in the PySpark documentation page:
      
      1. IPYTHON=1 ./pyspark
      2. IPYTHON_OPTS="notebook" ./pyspark
      3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark
      
      There are two changes:
      - We rely on PYTHONSTARTUP env var to start PySpark
      - Removed the quotes around $IPYTHON_OPTS... having quotes
        gloms them together as a single argument passed to `exec` which
        seemed to cause ipython to fail (it instead expects them as
        multiple arguments).
      300eaa99
    • Reynold Xin's avatar
      Merge pull request #374 from mateiz/completeness · 4b074fac
      Reynold Xin authored
      Add some missing Java API methods
      
      These are primarily for setting job groups, canceling jobs, and setting names on RDDs. Seemed like useful stuff to expose in Java.
      4b074fac
    • Reynold Xin's avatar
      Merge pull request #294 from RongGu/master · a9d53333
      Reynold Xin authored
      Bug fixes for updating the RDD block's memory and disk usage information
      
      Bug fixes for updating the RDD block's memory and disk usage information.
      From the code context, we can find that the memSize and diskSize here are both always equal to the size of the block. Actually, they never be zero. Thus, the logic here is wrong for recording the block usage in BlockStatus, especially for the blocks which are dropped from memory to ensure space for the new input rdd blocks. I have tested it that this would cause the storage metrics shown in the Storage webpage wrong and misleading. With this patch, the metrics will be okay.
       Finally, Merry Christmas, guys:)
      a9d53333
    • Patrick Wendell's avatar
      Small fix suggested by josh · 77ca9e1b
      Patrick Wendell authored
      77ca9e1b
    • Patrick Wendell's avatar
      Merge pull request #293 from pwendell/standalone-driver · d86a85e9
      Patrick Wendell authored
      SPARK-998: Support Launching Driver Inside of Standalone Mode
      
      [NOTE: I need to bring the tests up to date with new changes, so for now they will fail]
      
      This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI
      
      There are a few small TODO's here, but the code is generally feature-complete. They are:
      - Bring tests up to date and add test coverage
      - Restarting on failure should be optional and maybe off by default.
      - See if we can re-use akka connections to facilitate clients behind a firewall
      
      A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures.
      
      Instructions to test locally:
      - `sbt/sbt assembly/assembly examples/assembly`
      - start a local version of the standalone cluster manager
      
      ```
      ./spark-class org.apache.spark.deploy.client.DriverClient \
        -j -Dspark.test.property=something \
        -e SPARK_TEST_KEY=SOMEVALUE \
        launch spark://10.99.1.14:7077 \
        ../path-to-examples-assembly-jar \
        org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13
      ```
      - Go in the UI and make sure it started correctly, look at the output etc
      - Kill workers, the driver program, masters, etc.
      d86a85e9
    • Matei Zaharia's avatar
      Fix bug added when we changed AppDescription.maxCores to an Option · c43eb006
      Matei Zaharia authored
      The Scala compiler warned about this -- we were comparing an Option
      against an integer now.
      c43eb006
    • Matei Zaharia's avatar
      Add some missing Java API methods · 142921c6
      Matei Zaharia authored
      142921c6
    • Patrick Wendell's avatar
      Merge pull request #372 from pwendell/log4j-fix-1 · 26cdb5f6
      Patrick Wendell authored
      Send logs to stderr by default (instead of stdout).
      26cdb5f6
    • Patrick Wendell's avatar
      2af98198
    • Matei Zaharia's avatar
      Merge pull request #362 from mateiz/conf-getters · 12f414ed
      Matei Zaharia authored
      Use typed getters for configuration settings
      
      This improves some of the code style after SPARK-544.
      12f414ed
    • Patrick Wendell's avatar
      Some usability improvements · 67b9a336
      Patrick Wendell authored
      67b9a336
    • Patrick Wendell's avatar
      Set default logging to WARN for Spark streaming examples. · 35f80da2
      Patrick Wendell authored
      This programatically sets the log level to WARN by default for streaming
      tests. If the user has already specified a log4j.properties file,
      the user's file will take precedence over this default.
      35f80da2
    • Reynold Xin's avatar
      Merge pull request #361 from rxin/clean · 365cac94
      Reynold Xin authored
      Minor style cleanup. Mostly on indenting & line width changes.
      
      Focused on the few important files since they are the files that new contributors usually read first.
      365cac94
    • Reynold Xin's avatar
      Merge pull request #368 from pwendell/sbt-fix · 73c724e1
      Reynold Xin authored
      Don't delegate to users `sbt`.
      
      This changes our `sbt/sbt` script to not delegate to the user's `sbt`
      even if it is present. If users already have sbt installed and they
      want to use their own sbt, we'd expect them to just call sbt directly
      from within Spark. We no longer set any enironment variables or anything
      from this script, so they should just launch sbt directly on their own.
      
      There are a number of hard-to-debug issues which can come from the
      current appraoch. One is if the user is unaware of an existing sbt
      installation and now without explanation their build breaks because
      they haven't configured options correctly (such as permgen size)
      within their sbt (reported by @patmcdonough). Another is if the user has a much older version
      of sbt hanging around, in which case some of the older versions
      don't acutally work well when newer verisons of sbt are specified
      in the build file (reported by @marmbrus). A third is if the user
      has done some other modification to their sbt script, such as
      setting it to delegate to sbt/sbt in Spark, and this causes
      that to break (also reported by @marmbrus).
      
      So to keep things simple let's just avoid this path and
      remove it. Any user who already has sbt and wants to build
      spark with it should be able to understand easily how to do it.
      73c724e1
    • Reynold Xin's avatar
      295d8258
    • Patrick Wendell's avatar
      Small typo fix · 49cbf48b
      Patrick Wendell authored
      49cbf48b
    • Matei Zaharia's avatar
      a01f3401
    • Patrick Wendell's avatar
      Don't delegate to users `sbt`. · 4d2e388e
      Patrick Wendell authored
      This changes our `sbt/sbt` script to not delegate to the user's `sbt`
      even if it is present. If users already have sbt installed and they
      want to use their own sbt, we'd expect them to just call sbt directly
      from within Spark. We no longer set any enironment variables or anything
      from this script, so they should just launch sbt directly on their own.
      
      There are a number of hard-to-debug issues which can come from the
      current appraoch. One is if the user is unaware of an existing sbt
      installation and now without explanation their build breaks because
      they haven't configured options correctly (such as permgen size)
      within their sbt. Another is if the user has a much older version
      of sbt hanging around, in which case some of the older versions
      don't acutally work well when newer verisons of sbt are specified
      in the build file (reported by @marmbrus). A third is if the user
      has done some other modification to their sbt script, such as
      setting it to delegate to sbt/sbt in Spark, and this causes
      that to break (also reported by @marmbrus).
      
      So to keep things simple let's just avoid this path and
      remove it. Any user who already has sbt and wants to build
      spark with it should be able to understand easily how to do it.
      4d2e388e
    • Patrick Wendell's avatar
      Merge pull request #364 from pwendell/fix · dceedb46
      Patrick Wendell authored
      Fixing config option "retained_stages" => "retainedStages".
      
      This is a very esoteric option and it's out of sync with the style we use.
      So it seems fitting to fix it for 0.9.0.
      dceedb46
  3. Jan 08, 2014
Loading