Skip to content
Snippets Groups Projects
  1. Mar 05, 2014
    • CodingCat's avatar
      SPARK-1156: allow user to login into a cluster without slaves · 3eb009f3
      CodingCat authored
      Reported in https://spark-project.atlassian.net/browse/SPARK-1156
      
      The current spark-ec2 script doesn't allow user to login to a cluster without slaves. One of the issues brought by this behaviour is that when all the worker died, the user cannot even login to the cluster for debugging, etc.
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #58 from CodingCat/SPARK-1156 and squashes the following commits:
      
      104af07 [CodingCat] output ERROR to stderr
      9a71769 [CodingCat] do not allow user to start 0-slave cluster
      24a7c79 [CodingCat] allow user to login into a cluster without slaves
      3eb009f3
    • Mark Grover's avatar
      SPARK-1184: Update the distribution tar.gz to include spark-assembly jar · cda381f8
      Mark Grover authored
      See JIRA for details.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #78 from markgrover/SPARK-1184 and squashes the following commits:
      
      12b78e6 [Mark Grover] SPARK-1184: Update the distribution tar.gz to include spark-assembly jar
      cda381f8
    • liguoqiang's avatar
      Improve building with maven docs · 51ca7bd7
      liguoqiang authored
           mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
      
      to
      
           mvn -Dhadoop.version=... -Dsuites=org.apache.spark.repl.ReplSuite test
      
      Author: liguoqiang <liguoqiang@rd.tuan800.com>
      
      Closes #70 from witgo/building_with_maven and squashes the following commits:
      
      6ec8a54 [liguoqiang] spark.repl.ReplSuite to org.apache.spark.repl.ReplSuite
      51ca7bd7
    • CodingCat's avatar
      SPARK-1171: when executor is removed, we should minus totalCores instead of... · a3da5088
      CodingCat authored
      SPARK-1171: when executor is removed, we should minus totalCores instead of just freeCores on that executor
      
      https://spark-project.atlassian.net/browse/SPARK-1171
      
      When the executor is removed, the current implementation will only minus the freeCores of that executor. Actually we should minus the totalCores...
      
      Author: CodingCat <zhunansjtu@gmail.com>
      Author: Nan Zhu <CodingCat@users.noreply.github.com>
      
      Closes #63 from CodingCat/simplify_CoarseGrainedSchedulerBackend and squashes the following commits:
      
      f6bf93f [Nan Zhu] code clean
      19c2bb4 [CodingCat] use copy idiom to reconstruct the workerOffers
      43c13e9 [CodingCat] keep WorkerOffer immutable
      af470d3 [CodingCat] style fix
      0c0e409 [CodingCat] simplify the implementation of CoarseGrainedSchedulerBackend
      a3da5088
  2. Mar 04, 2014
  3. Mar 03, 2014
    • Kay Ousterhout's avatar
      Remove broken/unused Connection.getChunkFIFO method. · b14ede78
      Kay Ousterhout authored
      This method appears to be broken -- since it never removes
      anything from messages, and it adds new messages to it,
      the while loop is an infinite loop.  The method also does not appear
      to have ever been used since the code was added in 2012, so
      this commit removes it.
      
      cc @mateiz who originally added this method in case there's a reason it should be here! (https://github.com/apache/spark/commit/63051dd2bcc4bf09d413ff7cf89a37967edc33ba)
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #69 from kayousterhout/remove_get_fifo and squashes the following commits:
      
      053bc59 [Kay Ousterhout] Remove broken/unused Connection.getChunkFIFO method.
      b14ede78
    • Reynold Xin's avatar
      SPARK-1158: Fix flaky RateLimitedOutputStreamSuite. · f5ae38af
      Reynold Xin authored
      There was actually a problem with the RateLimitedOutputStream implementation where the first second doesn't write anything because of integer rounding.
      
      So RateLimitedOutputStream was overly aggressive in throttling.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #55 from rxin/ratelimitest and squashes the following commits:
      
      52ce1b7 [Reynold Xin] SPARK-1158: Fix flaky RateLimitedOutputStreamSuite.
      f5ae38af
    • Bryn Keller's avatar
      Added a unit test for PairRDDFunctions.lookup · 923dba50
      Bryn Keller authored
      Lookup didn't have a unit test. Added two tests, one for with a partitioner, and one for without.
      
      Author: Bryn Keller <bryn.keller@intel.com>
      
      Closes #36 from xoltar/lookup and squashes the following commits:
      
      3bc0d44 [Bryn Keller] Added a unit test for PairRDDFunctions.lookup
      923dba50
    • Kay Ousterhout's avatar
      Remove the remoteFetchTime metric. · b55cade8
      Kay Ousterhout authored
      This metric is confusing: it adds up all of the time to fetch
      shuffle inputs, but fetches often happen in parallel, so
      remoteFetchTime can be much longer than the task execution time.
      
      @squito it looks like you added this metric -- do you have a use case for it?
      
      cc @shivaram -- I know you've looked at the shuffle performance a lot so chime in here if this metric has turned out to be useful for you!
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #62 from kayousterhout/remove_fetch_variable and squashes the following commits:
      
      43341eb [Kay Ousterhout] Remote the remoteFetchTime metric.
      b55cade8
    • Chen Chao's avatar
      update proportion of memory · 9d225a91
      Chen Chao authored
      The default value of "spark.storage.memoryFraction" has been changed from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution.
      
      Author: Chen Chao <crazyjvm@gmail.com>
      
      Closes #66 from CrazyJvm/master and squashes the following commits:
      
      0f84d86 [Chen Chao] update proportion of memory
      9d225a91
    • Kay Ousterhout's avatar
      Removed accidentally checked in comment · 369aad6f
      Kay Ousterhout authored
      It looks like this comment was added a while ago by @mridulm as part of a merge and was accidentally checked in.  We should remove it.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #61 from kayousterhout/remove_comment and squashes the following commits:
      
      0b2b3f2 [Kay Ousterhout] Removed accidentally checked in comment
      369aad6f
    • Aaron Kimball's avatar
      SPARK-1173. (#2) Fix typo in Java streaming example. · f65c1f38
      Aaron Kimball authored
      Companion commit to pull request #64, fix the typo on the Java side of the docs.
      
      Author: Aaron Kimball <aaron@magnify.io>
      
      Closes #65 from kimballa/spark-1173-java-doc-update and squashes the following commits:
      
      8ce11d3 [Aaron Kimball] SPARK-1173. (#2) Fix typo in Java streaming example.
      f65c1f38
    • Aaron Kimball's avatar
      SPARK-1173. Improve scala streaming docs. · 2b53447f
      Aaron Kimball authored
      Clarify imports to add implicit conversions to DStream and
      fix other small typos in the streaming intro documentation.
      
      Tested by inspecting output via a local jekyll server, c&p'ing the scala commands into a spark terminal.
      
      Author: Aaron Kimball <aaron@magnify.io>
      
      Closes #64 from kimballa/spark-1173-streaming-docs and squashes the following commits:
      
      6fbff0e [Aaron Kimball] SPARK-1173. Improve scala streaming docs.
      2b53447f
  4. Mar 02, 2014
    • Patrick Wendell's avatar
      Add Jekyll tag to isolate "production-only" doc components. · 55a4f11b
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #56 from pwendell/jekyll-prod and squashes the following commits:
      
      1bdc3a8 [Patrick Wendell] Add Jekyll tag to isolate "production-only" doc components.
      55a4f11b
    • Patrick Wendell's avatar
      SPARK-1121: Include avro for yarn-alpha builds · c3f5e075
      Patrick Wendell authored
      This lets us explicitly include Avro based on a profile for 0.23.X
      builds. It makes me sad how convoluted it is to express this logic
      in Maven. @tgraves and @sryza curious if this works for you.
      
      I'm also considering just reverting to how it was before. The only
      real problem was that Spark advertised a dependency on Avro
      even though it only really depends transitively on Avro through
      other deps.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #49 from pwendell/avro-build-fix and squashes the following commits:
      
      8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
      c3f5e075
    • Sean Owen's avatar
      SPARK-1084.2 (resubmitted) · fd31adbf
      Sean Owen authored
      (Ported from https://github.com/apache/incubator-spark/pull/650 )
      
      This adds one more change though, to fix the scala version warning introduced by json4s recently.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #32 from srowen/SPARK-1084.2 and squashes the following commits:
      
      9240abd [Sean Owen] Avoid scala version conflict in scalap induced by json4s dependency
      1561cec [Sean Owen] Remove "exclude *" dependencies that are causing Maven warnings, and that are apparently unneeded anyway
      fd31adbf
    • Reynold Xin's avatar
      Ignore RateLimitedOutputStreamSuite for now. · 353ac6b4
      Reynold Xin authored
      This test has been flaky. We can re-enable it after @tdas has a chance to look at it.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #54 from rxin/ratelimit and squashes the following commits:
      
      1a12198 [Reynold Xin] Ignore RateLimitedOutputStreamSuite for now.
      353ac6b4
    • Aaron Davidson's avatar
      SPARK-1137: Make ZK PersistenceEngine not crash for wrong serialVersionUID · 46bcb955
      Aaron Davidson authored
      Previously, ZooKeeperPersistenceEngine would crash the whole Master process if
      there was stored data from a prior Spark version. Now, we just delete these files.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #4 from aarondav/zookeeper2 and squashes the following commits:
      
      fa8b40f [Aaron Davidson] SPARK-1137: Make ZK PersistenceEngine not crash for wrong serialVersionUID
      46bcb955
    • Patrick Wendell's avatar
      Remove remaining references to incubation · 1fd2bfd3
      Patrick Wendell authored
      This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #51 from pwendell/tlp and squashes the following commits:
      
      d553b1b [Patrick Wendell] Remove remaining references to incubation
      1fd2bfd3
    • Binh Nguyen's avatar
      Update io.netty from 4.0.13 Final to 4.0.17.Final · b70823c9
      Binh Nguyen authored
      This update contains a lot of bug fixes and some new perf improvements.
      It is also binary compatible with the current 4.0.13.Final
      
      For more information: http://netty.io/news/2014/02/25/4-0-17-Final.html
      
      Author: Binh Nguyen <ngbinh@gmail.com>
      
      Author: Binh Nguyen <ngbinh@gmail.com>
      
      Closes #41 from ngbinh/master and squashes the following commits:
      
      a9498f4 [Binh Nguyen] update io.netty to 4.0.17.Final
      b70823c9
    • Michael Armbrust's avatar
      Merge the old sbt-launch-lib.bash with the new sbt-launcher jar downloading logic. · 012bd5fb
      Michael Armbrust authored
      This allows developers to pass options (such as -D) to sbt.  I also modified the SparkBuild to ensure spark specific properties are propagated to forked test JVMs.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #14 from marmbrus/sbtScripts and squashes the following commits:
      
      c008b18 [Michael Armbrust] Merge the old sbt-launch-lib.bash with the new sbt-launcher jar downloading logic.
      012bd5fb
    • DB Tsai's avatar
      Initialized the regVal for first iteration in SGD optimizer · 6fc76e49
      DB Tsai authored
      Ported from https://github.com/apache/incubator-spark/pull/633
      
      In runMiniBatchSGD, the regVal (for 1st iter) should be initialized
      as sum of sqrt of weights if it's L2 update; for L1 update, the same logic is followed.
      
      It maybe not be important here for SGD since the updater doesn't take the loss
      as parameter to find the new weights. But it will give us the correct history of loss.
      However, for LBFGS optimizer we implemented, the correct loss with regVal is crucial to
      find the new weights.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #40 from dbtsai/dbtsai-smallRegValFix and squashes the following commits:
      
      77d47da [DB Tsai] In runMiniBatchSGD, the regVal (for 1st iter) should be initialized as sum of sqrt of weights if it's L2 update; for L1 update, the same logic is followed.
      6fc76e49
  5. Mar 01, 2014
    • CodingCat's avatar
      [SPARK-1100] prevent Spark from overwriting directory silently · 3a8b698e
      CodingCat authored
      Thanks for Diana Carroll to report this issue (https://spark-project.atlassian.net/browse/SPARK-1100)
      
      the current saveAsTextFile/SequenceFile will overwrite the output directory silently if the directory already exists, this behaviour is not desirable because
      
      overwriting the data silently is not user-friendly
      
      if the partition number of two writing operation changed, then the output directory will contain the results generated by two runnings
      
      My fix includes:
      
      add some new APIs with a flag for users to define whether he/she wants to overwrite the directory:
      if the flag is set to true, then the output directory is deleted first and then written into the new data to prevent the output directory contains results from multiple rounds of running;
      
      if the flag is set to false, Spark will throw an exception if the output directory already exists
      
      changed JavaAPI part
      
      default behaviour is overwriting
      
      Two questions
      
      should we deprecate the old APIs without such a flag?
      
      I noticed that Spark Streaming also called these APIs, I thought we don't need to change the related part in streaming? @tdas
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #11 from CodingCat/SPARK-1100 and squashes the following commits:
      
      6a4e3a3 [CodingCat] code clean
      ef2d43f [CodingCat] add new test cases and code clean
      ac63136 [CodingCat] checkOutputSpecs not applicable to FSOutputFormat
      ec490e8 [CodingCat] prevent Spark from overwriting directory silently and leaving dirty directory
      3a8b698e
    • CodingCat's avatar
      [SPARK-1150] fix repo location in create script (re-open) · fe195ae1
      CodingCat authored
      reopen for https://spark-project.atlassian.net/browse/SPARK-1150
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #52 from CodingCat/script_fixes and squashes the following commits:
      
      fc05a71 [CodingCat] fix repo location in create script
      fe195ae1
    • Patrick Wendell's avatar
      Revert "[SPARK-1150] fix repo location in create script" · ec992e18
      Patrick Wendell authored
      This reverts commit 9aa09571.
      ec992e18
    • Mark Grover's avatar
      [SPARK-1150] fix repo location in create script · 9aa09571
      Mark Grover authored
      https://spark-project.atlassian.net/browse/SPARK-1150
      
      fix the repo location in create_release script
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #48 from CodingCat/script_fixes and squashes the following commits:
      
      01f4bf7 [Mark Grover] Fixing some nitpicks
      d2244d4 [Mark Grover] SPARK-676: Abbreviation in SPARK_MEM but not in SPARK_WORKER_MEMORY
      9aa09571
    • Kay Ousterhout's avatar
      [SPARK-979] Randomize order of offers. · 556c5668
      Kay Ousterhout authored
      This commit randomizes the order of resource offers to avoid scheduling
      all tasks on the same small set of machines.
      
      This is a much simpler solution to SPARK-979 than #7.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #27 from kayousterhout/randomize and squashes the following commits:
      
      435d817 [Kay Ousterhout] [SPARK-979] Randomize order of offers.
      556c5668
  6. Feb 28, 2014
  7. Feb 27, 2014
    • Kay Ousterhout's avatar
      Remote BlockFetchTracker trait · edf8a56a
      Kay Ousterhout authored
      This trait seems to have been created a while ago when there
      were multiple implementations; now that there's just one, I think it
      makes sense to merge it into the BlockFetcherIterator trait.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #39 from kayousterhout/remove_tracker and squashes the following commits:
      
      8173939 [Kay Ousterhout] Remote BlockFetchTracker.
      edf8a56a
    • Reynold Xin's avatar
      Removed reference to incubation in Spark user docs. · 40e080a6
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2 from rxin/docs and squashes the following commits:
      
      08bbd5f [Reynold Xin] Removed reference to incubation in Spark user docs.
      40e080a6
    • Patrick Wendell's avatar
      [HOTFIX] Patching maven build after #6 (SPARK-1121). · c42557be
      Patrick Wendell authored
      That patch removed the Maven avro declaration but didn't remove the
      actual dependency in core. /cc @scrapcodes
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #37 from pwendell/master and squashes the following commits:
      
      0ef3008 [Patrick Wendell] [HOTFIX] Patching maven build after #6 (SPARK-1121).
      c42557be
    • Sean Owen's avatar
      SPARK 1084.1 (resubmitted) · 12bbca20
      Sean Owen authored
      (Ported from https://github.com/apache/incubator-spark/pull/637 )
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #31 from srowen/SPARK-1084.1 and squashes the following commits:
      
      6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it
      f35b833 [Sean Owen] Fix two misc javadoc problems
      254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit
      5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates
      007762b [Sean Owen] Remove dead scaladoc links
      b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>
      12bbca20
    • Raymond Liu's avatar
      Show Master status on UI page · aace2c09
      Raymond Liu authored
      For standalone HA mode, A status is useful to identify the current master, already in json format too.
      
      Author: Raymond Liu <raymond.liu@intel.com>
      
      Closes #24 from colorant/status and squashes the following commits:
      
      df630b3 [Raymond Liu] Show Master status on UI page
      aace2c09
    • CodingCat's avatar
      [SPARK-1089] fix the regression problem on ADD_JARS in 0.9 · 345df5f4
      CodingCat authored
      https://spark-project.atlassian.net/browse/SPARK-1089
      
      copied from JIRA, reported by @ash211
      
      "Using the ADD_JARS environment variable with spark-shell used to add the jar to both the shell and the various workers. Now it only adds to the workers and importing a custom class in the shell is broken.
      The workaround is to add custom jars to both ADD_JARS and SPARK_CLASSPATH.
      We should fix ADD_JARS so it works properly again.
      See various threads on the user list:
      https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201402.mbox/%3CCAJbo4neMLiTrnm1XbyqomWmp0m+EUcg4yE-txuRGSVKOb5KLeA@mail.gmail.com%3E
      (another one that doesn't appear in the archives yet titled "ADD_JARS not working on 0.9")"
      
      The reason of this bug is two-folds
      
      in the current implementation of SparkILoop.scala, the settings.classpath is not set properly when the process() method is invoked
      
      the weird behaviour of Scala 2.10, (I personally thought it is a bug)
      
      if we simply set value of a PathSettings object (like settings.classpath), the isDefault is not set to true (this is a flag showing if the variable is modified), so it makes the PathResolver loads the default CLASSPATH environment variable value to calculated the path (see https://github.com/scala/scala/blob/2.10.x/src/compiler/scala/tools/util/PathResolver.scala#L215)
      
      what we have to do is to manually make this flag set, (https://github.com/CodingCat/incubator-spark/blob/e3991d97ddc33e77645e4559b13bf78b9e68239a/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L884)
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #13 from CodingCat/SPARK-1089 and squashes the following commits:
      
      8af81e7 [CodingCat] impose non-null settings
      9aa2125 [CodingCat] code cleaning
      ce36676 [CodingCat] code cleaning
      e045582 [CodingCat] fix the regression problem on ADD_JARS in 0.9
      345df5f4
    • Prashant Sharma's avatar
      SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set · 6ccd6c55
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #6 from ScrapCodes/SPARK-1121/avro-dep-fix and squashes the following commits:
      
      9b29e34 [Prashant Sharma] Review feedback on PR
      46ed2ad [Prashant Sharma] SPARK-1121-Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set
      6ccd6c55
Loading