Skip to content
Snippets Groups Projects
  1. Feb 17, 2014
    • Andrew Or's avatar
      Fix typos in Spark Streaming programming guide · 767e3ae1
      Andrew Or authored
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #536 from andrewor14/streaming-typos and squashes the following commits:
      
      a05faa6 [Andrew Or] Fix broken link and wording
      bc2e4bc [Andrew Or] Merge github.com:apache/incubator-spark into streaming-typos
      d5515b4 [Andrew Or] TD's comments
      767ef12 [Andrew Or] Fix broken links
      8f4c731 [Andrew Or] Fix typos in programming guide
      767e3ae1
    • Andrew Ash's avatar
      Worker registration logging fix · c0795cf4
      Andrew Ash authored
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #608 from ash211/patch-7 and squashes the following commits:
      
      bd85f2a [Andrew Ash] Worker registration logging fix
      c0795cf4
  2. Feb 16, 2014
  3. Feb 14, 2014
    • Andrew Ash's avatar
      Typo: Standlone -> Standalone · eec4bd1a
      Andrew Ash authored
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #601 from ash211/typo and squashes the following commits:
      
      9cd43ac [Andrew Ash] Change docs references to metrics.properties, not metrics.conf
      3813ff1 [Andrew Ash] Typo: mulitcast -> multicast
      873bd2f [Andrew Ash] Typo: Standlone -> Standalone
      eec4bd1a
  4. Feb 13, 2014
  5. Feb 12, 2014
    • Xiangrui Meng's avatar
      Merge pull request #591 from mengxr/transient-new. · 7e29e027
      Xiangrui Meng authored
      SPARK-1076: [Fix #578] add @transient to some vals
      
      I'll try to be more careful next time.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #591 and squashes the following commits:
      
      2b4f044 [Xiangrui Meng] add @transient to prev in ZippedWithIndexRDD add @transient to seed in PartitionwiseSampledRDD
      7e29e027
    • Xiangrui Meng's avatar
      Merge pull request #589 from mengxr/index. · 2bea0709
      Xiangrui Meng authored
      SPARK-1076: Convert Int to Long to avoid overflow
      
      Patch for PR #578.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #589 and squashes the following commits:
      
      98c435e [Xiangrui Meng] cast Int to Long to avoid Int overflow
      2bea0709
    • Xiangrui Meng's avatar
      Merge pull request #578 from mengxr/rank. · e733d655
      Xiangrui Meng authored
      SPARK-1076: zipWithIndex and zipWithUniqueId to RDD
      
      Assign ranks to an ordered or unordered data set is a common operation. This could be done by first counting records in each partition and then assign ranks in parallel.
      
      The purpose of assigning ranks to an unordered set is usually to get a unique id for each item, e.g., to map feature names to feature indices. In such cases, the assignment could be done without counting records, saving one spark job.
      
      https://spark-project.atlassian.net/browse/SPARK-1076
      
      == update ==
      Because assigning ranks is very similar to Scala's zipWithIndex, I changed the method name to zipWithIndex and put the index in the value field.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #578 and squashes the following commits:
      
      52a05e1 [Xiangrui Meng] changed assignRanks to zipWithIndex changed assignUniqueIds to zipWithUniqueId minor updates
      756881c [Xiangrui Meng] simplified RankedRDD by implementing assignUniqueIds separately moved couting iterator size to Utils do not count items in the last partition and skip counting if there is only one partition
      630868c [Xiangrui Meng] newline
      21b434b [Xiangrui Meng] add assignRanks and assignUniqueIds to RDD
      e733d655
    • Raymond Liu's avatar
      Merge pull request #583 from colorant/zookeeper. · 68b2c0d0
      Raymond Liu authored
      Minor fix for ZooKeeperPersistenceEngine to use configured working dir
      
      Author: Raymond Liu <raymond.liu@intel.com>
      
      Closes #583 and squashes the following commits:
      
      91b0609 [Raymond Liu] Minor fix for ZooKeeperPersistenceEngine to use configured working dir
      68b2c0d0
  6. Feb 11, 2014
    • Holden Karau's avatar
      Merge pull request #571 from holdenk/switchtobinarysearch. · b0dab1bb
      Holden Karau authored
      SPARK-1072 Use binary search when needed in RangePartioner
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #571 and squashes the following commits:
      
      f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner
      4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav
      7099962 [Holden Karau] Add the binary search to only init once
      1bef01d [Holden Karau] CR feedback
      a21e097 [Holden Karau] Use binary search if we have more than 1000 elements inside of RangePartitioner
      b0dab1bb
    • Henry Saputra's avatar
      Merge pull request #577 from hsaputra/fix_simple_streaming_doc. · ba38d989
      Henry Saputra authored
      SPARK-1075 Fix doc in the Spark Streaming custom receiver closing bracket in the class constructor
      
      The closing parentheses in the constructor in the first code block example is reversed:
      diff --git a/docs/streaming-custom-receivers.md b/docs/streaming-custom-receivers.md
      index 4e27d65..3fb540c 100644
      — a/docs/streaming-custom-receivers.md
      +++ b/docs/streaming-custom-receivers.md
      @@ -14,7 +14,7 @@ This starts with implementing NetworkReceiver(api/streaming/index.html#org.apa
      The following is a simple socket text-stream receiver.
      {% highlight scala %}
      class SocketTextStreamReceiver(host: String, port: Int(
      + class SocketTextStreamReceiver(host: String, port: Int)
      extends NetworkReceiverString
      {
      protected lazy val blocksGenerator: BlockGenerator =
      
      Author: Henry Saputra <henry@platfora.com>
      
      Closes #577 and squashes the following commits:
      
      6508341 [Henry Saputra] SPARK-1075 Fix doc in the Spark Streaming custom receiver.
      ba38d989
    • Chen Chao's avatar
      Merge pull request #579 from CrazyJvm/patch-1. · 4afe6ccf
      Chen Chao authored
      "in the source DStream" rather than "int the source DStream"
      
      "flatMap is a one-to-many DStream operation that creates a new DStream by generating multiple new records from each record int the source DStream."
      
      Author: Chen Chao <crazyjvm@gmail.com>
      
      Closes #579 and squashes the following commits:
      
      4abcae3 [Chen Chao] in the source DStream
      4afe6ccf
  7. Feb 10, 2014
  8. Feb 09, 2014
    • Martin Jaggi's avatar
      Merge pull request #566 from martinjaggi/copy-MLlib-d. · 2182aa3c
      Martin Jaggi authored
      new MLlib documentation for optimization, regression and classification
      
      new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods.
      also did some minor changes in the code for consistency. scala tests pass.
      
      this is the rebased branch, i deleted the old PR
      
      jira:
      https://spark-project.atlassian.net/browse/MLLIB-19
      
      Author: Martin Jaggi <m.jaggi@gmail.com>
      
      Closes #566 and squashes the following commits:
      
      5f0f31e [Martin Jaggi] line wrap at 100 chars
      4e094fb [Martin Jaggi] better description of GradientDescent
      1d6965d [Martin Jaggi] remove broken url
      ea569c3 [Martin Jaggi] telling what updater actually does
      964732b [Martin Jaggi] lambda R() in documentation
      a6c6228 [Martin Jaggi] better comments in SGD code for regression
      b32224a [Martin Jaggi] new optimization documentation
      d5dfef7 [Martin Jaggi] new classification and regression documentation
      b07ead6 [Martin Jaggi] correct scaling for MSE loss
      ba6158c [Martin Jaggi] use d for the number of features
      bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient
      2182aa3c
    • qqsun8819's avatar
      Merge pull request #551 from qqsun8819/json-protocol. · afc8f3cb
      qqsun8819 authored
      [SPARK-1038] Add more fields in JsonProtocol and add tests that verify the JSON itself
      
      This is a PR for SPARK-1038. Two major changes:
      1 add some fields to JsonProtocol which is new and important to standalone-related data structures
      2 Use Diff in liftweb.json to verity the stringified Json output for detecting someone mod type T to Option[T]
      
      Author: qqsun8819 <jin.oyj@alibaba-inc.com>
      
      Closes #551 and squashes the following commits:
      
      fdf0b4e [qqsun8819] [SPARK-1038] 1. Change code style for more readable according to rxin review 2. change submitdate hard-coded string to a date object toString for more complexiblity
      095a26f [qqsun8819] [SPARK-1038] mod according to  review of pwendel, use hard-coded json string for json data validation. Each test use its own json string
      0524e41 [qqsun8819] Merge remote-tracking branch 'upstream/master' into json-protocol
      d203d5c [qqsun8819] [SPARK-1038] Add more fields in JsonProtocol and add tests that verify the JSON itself
      afc8f3cb
    • Patrick Wendell's avatar
      Merge pull request #569 from pwendell/merge-fixes. · 94ccf869
      Patrick Wendell authored
      Fixes bug where merges won't close associated pull request.
      
      Previously we added "Closes #XX" in the title. Github will sometimes
      linbreak the title in a way that causes this to not work. This patch
      instead adds the line in the body.
      
      This also makes the commit format more concise for merge commits.
      We might consider just dropping those in the future.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #569 and squashes the following commits:
      
      732eba1 [Patrick Wendell] Fixes bug where merges won't close associated pull request.
      94ccf869
    • Patrick Wendell's avatar
      Merge pull request #557 from ScrapCodes/style. Closes #557. · b69f8b2a
      Patrick Wendell authored
      SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      Author: Prashant Sharma <scrapcodes@gmail.com>
      
      == Merge branch commits ==
      
      commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4
      Author: Prashant Sharma <scrapcodes@gmail.com>
      Date:   Sun Feb 9 17:39:07 2014 +0530
      
          scala style fixes
      
      commit f91709887a8e0b608c5c2b282db19b8a44d53a43
      Author: Patrick Wendell <pwendell@gmail.com>
      Date:   Fri Jan 24 11:22:53 2014 -0800
      
          Adding scalastyle snapshot
      b69f8b2a
    • CodingCat's avatar
      Merge pull request #556 from CodingCat/JettyUtil. Closes #556. · b6dba10a
      CodingCat authored
      [SPARK-1060] startJettyServer should explicitly use IP information
      
      https://spark-project.atlassian.net/browse/SPARK-1060
      
      In the current implementation, the webserver in Master/Worker is started with
      
      val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers)
      
      inside startJettyServer:
      
      val server = new Server(currentPort) //here, the Server will take "0.0.0.0" as the hostname, i.e. will always bind to the IP address of the first NIC
      
      this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2, the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when starting the web server, for the reason stated above, it will always bind to the N1's address
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      == Merge branch commits ==
      
      commit 6c6d9a8ccc9ec4590678a3b34cb03df19092029d
      Author: CodingCat <zhunansjtu@gmail.com>
      Date:   Thu Feb 6 14:53:34 2014 -0500
      
          startJettyServer should explicitly use IP information
      b6dba10a
    • jyotiska's avatar
      Merge pull request #562 from jyotiska/master. Closes #562. · 2ef37c93
      jyotiska authored
      Added example Python code for sort
      
      I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code.
      
      Author: jyotiska <jyotiska123@gmail.com>
      
      == Merge branch commits ==
      
      commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
      Author: jyotiska <jyotiska123@gmail.com>
      Date:   Sun Feb 9 11:00:41 2014 +0530
      
          Added comments in code on collect() method
      
      commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
      Author: jyotiska <jyotiska123@gmail.com>
      Date:   Sat Feb 8 13:12:37 2014 +0530
      
          Updated python example code sort.py
      
      commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
      Author: jyotiska <jyotiska123@gmail.com>
      Date:   Sat Feb 8 12:59:09 2014 +0530
      
          Added example python code for sort
      2ef37c93
    • Patrick Wendell's avatar
      Merge pull request #560 from pwendell/logging. Closes #560. · b6d40b78
      Patrick Wendell authored
      [WIP] SPARK-1067: Default log4j initialization causes errors for those not using log4j
      
      To fix this - we add a check when initializing log4j.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      == Merge branch commits ==
      
      commit ffdce513877f64b6eed6d36138c3e0003d392889
      Author: Patrick Wendell <pwendell@gmail.com>
      Date:   Fri Feb 7 15:22:29 2014 -0800
      
          Logging fix
      b6d40b78
    • Patrick Wendell's avatar
      Merge pull request #565 from pwendell/dev-scripts. Closes #565. · f892da87
      Patrick Wendell authored
      SPARK-1066: Add developer scripts to repository.
      
      These are some developer scripts I've been maintaining in a separate public repo. This patch adds them to the Spark repository so they can evolve here and are clearly accessible to all committers.
      
      I may do some small additional clean-up in this PR, but wanted to put them here in case others want to review. There are a few types of scripts here:
      
      1. A tool to merge pull requests.
      2. A script for packaging releases.
      3. A script for auditing release candidates.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      == Merge branch commits ==
      
      commit 5d5d331d01f6fd59c2eb830f652955119b012173
      Author: Patrick Wendell <pwendell@gmail.com>
      Date:   Sat Feb 8 22:11:47 2014 -0800
      
          SPARK-1066: Add developer scripts to repository.
      f892da87
  9. Feb 08, 2014
    • Mark Hamstra's avatar
      Merge pull request #542 from markhamstra/versionBump. Closes #542. · c2341c92
      Mark Hamstra authored
      Version number to 1.0.0-SNAPSHOT
      
      Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.
      
      @pwendell
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      == Merge branch commits ==
      
      commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
      Author: Mark Hamstra <markhamstra@gmail.com>
      Date:   Wed Feb 5 09:30:32 2014 -0800
      
          Version number to 1.0.0-SNAPSHOT
      c2341c92
    • Qiuzhuang Lian's avatar
      Merge pull request #561 from Qiuzhuang/master. Closes #561. · f0ce736f
      Qiuzhuang Lian authored
      Kill drivers in postStop() for Worker.
      
       JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
      
      Author: Qiuzhuang Lian <Qiuzhuang.Lian@gmail.com>
      
      == Merge branch commits ==
      
      commit 9c19ce63637eee9369edd235979288d3d9fc9105
      Author: Qiuzhuang Lian <Qiuzhuang.Lian@gmail.com>
      Date:   Sat Feb 8 16:07:39 2014 +0800
      
          Kill drivers in postStop() for Worker.
           JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
      f0ce736f
    • Jey Kottalam's avatar
      Merge pull request #454 from jey/atomic-sbt-download. Closes #454. · 78050805
      Jey Kottalam authored
      Make sbt download an atomic operation
      
      Modifies the `sbt/sbt` script to gracefully recover when a previous invocation died in the middle of downloading the SBT jar.
      
      Author: Jey Kottalam <jey@cs.berkeley.edu>
      
      == Merge branch commits ==
      
      commit 6c600eb434a2f3e7d70b67831aeebde9b5c0f43b
      Author: Jey Kottalam <jey@cs.berkeley.edu>
      Date:   Fri Jan 17 10:43:54 2014 -0800
      
          Make sbt download an atomic operation
      78050805
    • Martin Jaggi's avatar
      Merge pull request #552 from martinjaggi/master. Closes #552. · fabf1749
      Martin Jaggi authored
      tex formulas in the documentation
      
      using mathjax.
      and spliting the MLlib documentation by techniques
      
      see jira
      https://spark-project.atlassian.net/browse/MLLIB-19
      and
      https://github.com/shivaram/spark/compare/mathjax
      
      Author: Martin Jaggi <m.jaggi@gmail.com>
      
      == Merge branch commits ==
      
      commit 0364bfabbfc347f917216057a20c39b631842481
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Fri Feb 7 03:19:38 2014 +0100
      
          minor polishing, as suggested by @pwendell
      
      commit dcd2142c164b2f602bf472bb152ad55bae82d31a
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 18:04:26 2014 +0100
      
          enabling inline latex formulas with $.$
      
          same mathjax configuration as used in math.stackexchange.com
      
          sample usage in the linear algebra (SVD) documentation
      
      commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 17:31:29 2014 +0100
      
          split MLlib documentation by techniques
      
          and linked from the main mllib-guide.md site
      
      commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 16:59:43 2014 +0100
      
          enable mathjax formula in the .md documentation files
      
          code by @shivaram
      
      commit d73948db0d9bc36296054e79fec5b1a657b4eab4
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 16:57:23 2014 +0100
      
          minor update on how to compile the documentation
      fabf1749
  10. Feb 07, 2014
    • Andrew Ash's avatar
      Merge pull request #506 from ash211/intersection. Closes #506. · 3a9d82cc
      Andrew Ash authored
      SPARK-1062 Add rdd.intersection(otherRdd) method
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      == Merge branch commits ==
      
      commit 5d9982b171b9572649e9828f37ef0b43f0242912
      Author: Andrew Ash <andrew@andrewash.com>
      Date:   Thu Feb 6 18:11:45 2014 -0800
      
          Minor fixes
      
          - style: (v,null) => (v, null)
          - mention the shuffle in Javadoc
      
      commit b86d02f14e810902719cef893cf6bfa18ff9acb0
      Author: Andrew Ash <andrew@andrewash.com>
      Date:   Sun Feb 2 13:17:40 2014 -0800
      
          Overload .intersection() for numPartitions and custom Partitioner
      
      commit bcaa34911fcc6bb5bc5e4f9fe46d1df73cb71c09
      Author: Andrew Ash <andrew@andrewash.com>
      Date:   Sun Feb 2 13:05:40 2014 -0800
      
          Better naming of parameters in intersection's filter
      
      commit b10a6af2d793ec6e9a06c798007fac3f6b860d89
      Author: Andrew Ash <andrew@andrewash.com>
      Date:   Sat Jan 25 23:06:26 2014 -0800
      
          Follow spark code format conventions of tab => 2 spaces
      
      commit 965256e4304cca514bb36a1a36087711dec535ec
      Author: Andrew Ash <andrew@andrewash.com>
      Date:   Fri Jan 24 00:28:01 2014 -0800
      
          Add rdd.intersection(otherRdd) method
      3a9d82cc
    • Andrew Or's avatar
      Merge pull request #533 from andrewor14/master. Closes #533. · 1896c6e7
      Andrew Or authored
      External spilling - generalize batching logic
      
      The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      == Merge branch commits ==
      
      commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Wed Feb 5 12:09:32 2014 -0800
      
          Also privatize fields
      
      commit 090544a87a0767effd0c835a53952f72fc8d24f0
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Wed Feb 5 10:58:23 2014 -0800
      
          Privatize methods
      
      commit 13920c918efe22e66a1760b14beceb17a61fd8cc
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Tue Feb 4 16:34:15 2014 -0800
      
          Update docs
      
      commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Tue Feb 4 13:44:24 2014 -0800
      
          Typo: phyiscal -> physical
      
      commit 287ef44e593ad72f7434b759be3170d9ee2723d2
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Tue Feb 4 13:38:32 2014 -0800
      
          Avoid reading the entire batch into memory; also simplify streaming logic
      
          Additionally, address formatting comments.
      
      commit 3df700509955f7074821e9aab1e74cb53c58b5a5
      Merge: a531d2e 164489d
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Mon Feb 3 18:27:49 2014 -0800
      
          Merge branch 'master' of github.com:andrewor14/incubator-spark
      
      commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Mon Feb 3 18:18:04 2014 -0800
      
          Relax assumptions on compressors and serializers when batching
      
          This commit introduces an intermediate layer of an input stream on the batch level.
          This guards against interference from higher level streams (i.e. compression and
          deserialization streams), especially pre-fetching, without specifically targeting
          particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
      
      commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
      Author: Andrew Or <andrewor14@gmail.com>
      Date:   Mon Feb 3 18:18:04 2014 -0800
      
          Relax assumptions on compressors and serializers when batching
      
          This commit introduces an intermediate layer of an input stream on the batch level.
          This guards against interference from higher level streams (i.e. compression and
          deserialization streams), especially pre-fetching, without specifically targeting
          particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
      1896c6e7
  11. Feb 06, 2014
    • Kay Ousterhout's avatar
      Merge pull request #450 from kayousterhout/fetch_failures. Closes #450. · 0b448df6
      Kay Ousterhout authored
      Only run ResubmitFailedStages event after a fetch fails
      
      Previously, the ResubmitFailedStages event was called every
      200 milliseconds, leading to a lot of unnecessary event processing
      and clogged DAGScheduler logs.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      == Merge branch commits ==
      
      commit e603784b3a562980e6f1863845097effe2129d3b
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      Date:   Wed Feb 5 11:34:41 2014 -0800
      
          Re-add check for empty set of failed stages
      
      commit d258f0ef50caff4bbb19fb95a6b82186db1935bf
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      Date:   Wed Jan 15 23:35:41 2014 -0800
      
          Only run ResubmitFailedStages event after a fetch fails
      
          Previously, the ResubmitFailedStages event was called every
          200 milliseconds, leading to a lot of unnecessary event processing
          and clogged DAGScheduler logs.
      0b448df6
    • Kay Ousterhout's avatar
      Merge pull request #321 from kayousterhout/ui_kill_fix. Closes #321. · 18ad59e2
      Kay Ousterhout authored
      Inform DAG scheduler about all started/finished tasks.
      
      Previously, the DAG scheduler was not always informed
      when tasks started and finished. The simplest example here
      is for speculated tasks: the DAGScheduler was only told about
      the first attempt of a task, meaning that SparkListeners were
      also not told about multiple task attempts, so users can't see
      what's going on with speculation in the UI.  The DAGScheduler
      also wasn't always told about finished tasks, so in the UI, some
      tasks will never be shown as finished (this occurs, for example,
      if a task set gets killed).
      
      The other problem is that the fairness accounting was wrong
      -- the number of running tasks in a pool was decreased when a
      task set was considered done, even if all of its tasks hadn't
      yet finished.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      == Merge branch commits ==
      
      commit c8d547d0f7a17f5a193bef05f5872b9f475675c5
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      Date:   Wed Jan 15 16:47:33 2014 -0800
      
          Addressed Reynold's review comments.
      
          Always use a TaskEndReason (remove the option), and explicitly
          signal when we don't know the reason. Also, always tell
          DAGScheduler (and associated listeners) about started tasks, even
          when they're speculated.
      
      commit 3fee1e2e3c06b975ff7f95d595448f38cce97a04
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      Date:   Wed Jan 8 22:58:13 2014 -0800
      
          Fixed broken test and improved logging
      
      commit ff12fcaa2567c5d02b75a1d5db35687225bcd46f
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      Date:   Sun Dec 29 21:08:20 2013 -0800
      
          Inform DAG scheduler about all finished tasks.
      
          Previously, the DAG scheduler was not always informed
          when tasks finished. For example, when a task set was
          aborted, the DAG scheduler was never told when the tasks
          in that task set finished. The DAG scheduler was also
          never told about the completion of speculated tasks.
          This led to confusion with SparkListeners because information
          about the completion of those tasks was never passed on to
          the listeners (so in the UI, for example, some tasks will never
          be shown as finished).
      
          The other problem is that the fairness accounting was wrong
          -- the number of running tasks in a pool was decreased when a
          task set was considered done, even if all of its tasks hadn't
          yet finished.
      18ad59e2
    • Sandy Ryza's avatar
      Merge pull request #554 from sryza/sandy-spark-1056. Closes #554. · 446403b6
      Sandy Ryza authored
      SPARK-1056. Fix header comment in Executor to not imply that it's only u...
      
      ...sed for Mesos and Standalone.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      == Merge branch commits ==
      
      commit 1f2443d902a26365a5c23e4af9077e1539ed2eab
      Author: Sandy Ryza <sandy@cloudera.com>
      Date:   Thu Feb 6 15:03:50 2014 -0800
      
          SPARK-1056. Fix header comment in Executor to not imply that it's only used for Mesos and Standalone
      446403b6
    • Prashant Sharma's avatar
      Merge pull request #498 from ScrapCodes/python-api. Closes #498. · 084839ba
      Prashant Sharma authored
      Python api additions
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      == Merge branch commits ==
      
      commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Fri Jan 24 11:50:29 2014 +0530
      
          Josh's and Patricks review comments.
      
      commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 17:27:17 2014 +0530
      
          fixed doc tests
      
      commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 16:48:43 2014 +0530
      
          Added keys and values methods for PairFunctions in python
      
      commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 13:51:26 2014 +0530
      
          Added foreachPartition
      
      commit 05f05341a187cba829ac0e6c2bdf30be49948c89
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 13:02:59 2014 +0530
      
          Added coalesce fucntion to python API
      
      commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 12:52:44 2014 +0530
      
          added repartition function to python API.
      084839ba
    • Kay Ousterhout's avatar
      Merge pull request #545 from kayousterhout/fix_progress. Closes #545. · 79c95527
      Kay Ousterhout authored
      Fix off-by-one error with task progress info log.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      == Merge branch commits ==
      
      commit 29798fc685c4e7e3eb3bf91c75df7fa8ec94a235
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      Date:   Wed Feb 5 13:40:01 2014 -0800
      
          Fix off-by-one error with task progress info log.
      79c95527
    • Thomas Graves's avatar
      Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Closes #526. · 38020961
      Thomas Graves authored
      spark on yarn - yarn-client mode doesn't always exit immediately
      
      https://spark-project.atlassian.net/browse/SPARK-1049
      
      If you run in the yarn-client mode but you don't get all the workers you requested right away and then you exit your application, the application master stays around until it gets the number of workers you initially requested. This is a waste of resources.  The AM should exit immediately upon the client going away.
      
      This fix simply checks to see if the driver closed while its waiting for the initial # of workers.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      == Merge branch commits ==
      
      commit 03f40a62584b6bdd094ba91670cd4aa6afe7cd81
      Author: Thomas Graves <tgraves@apache.org>
      Date:   Fri Jan 31 11:23:10 2014 -0600
      
          spark on yarn - yarn-client mode doesn't always exit immediately
      38020961
Loading