Skip to content
Snippets Groups Projects
  1. May 12, 2014
    • Andrew Or's avatar
      [SPARK-1736] Spark submit for Windows · beb9cbac
      Andrew Or authored
      Tested on Windows 7.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #745 from andrewor14/windows-submit and squashes the following commits:
      
      c0b58fb [Andrew Or] Allow spaces in parameters
      162e54d [Andrew Or] Merge branch 'master' of github.com:apache/spark into windows-submit
      91597ce [Andrew Or] Make spark-shell.cmd use spark-submit.cmd
      af6fd29 [Andrew Or] Add spark submit for Windows
      beb9cbac
    • Sean Owen's avatar
      SPARK-1802. (Addendium) Audit dependency graph when Spark is built with -Pyarn · 4b31f4ec
      Sean Owen authored
      Following on a few more items from SPARK-1802 --
      
      The first commit touches up a few similar problems remaining with the YARN profile. I think this is worth cherry-picking.
      
      The second commit is more of the same for hadoop-client, although the fix is a little more complex. It may or may not be worth bothering with.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #746 from srowen/SPARK-1802.2 and squashes the following commits:
      
      52aeb41 [Sean Owen] Add more commons-logging, servlet excludes to avoid conflicts in assembly when building for YARN
      4b31f4ec
    • Patrick Wendell's avatar
      SPARK-1623: Use File objects instead of String's in HTTPBroadcast · 925d8b24
      Patrick Wendell authored
      This seems strictly better, and I think it's justified only the grounds of
      clean-up. It might also fix issues with path conversions, but I haven't
      yet isolated any instance of that happening.
      
      /cc @srowen @tdas
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #749 from pwendell/broadcast-cleanup and squashes the following commits:
      
      d6d54f2 [Patrick Wendell] SPARK-1623: Use File objects instead of string's in HTTPBroadcast
      925d8b24
    • Patrick Wendell's avatar
      Rename testExecutorEnvs --> executorEnvs. · 3ce526b1
      Patrick Wendell authored
      This was changed, but in fact, it's used for things other than tests.
      So I've changed it back.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #747 from pwendell/executor-env and squashes the following commits:
      
      36a60a5 [Patrick Wendell] Rename testExecutorEnvs --> executorEnvs.
      3ce526b1
    • Sean Owen's avatar
      SPARK-1802. Audit dependency graph when Spark is built with -Phive · 8586bf56
      Sean Owen authored
      This initial commit resolves the conflicts in the Hive profiles as noted in https://issues.apache.org/jira/browse/SPARK-1802 .
      
      Most of the fix was to note that Hive drags in Avro, and so if the hive module depends on Spark's version of the `avro-*` dependencies, it will pull in our exclusions as needed too. But I found we need to copy some exclusions between the two Avro dependencies to get this right. And then had to squash some commons-logging intrusions.
      
      This turned up another annoying find, that `hive-exec` is basically an "assembly" artifact that _also_ packages all of its transitive dependencies. This means the final assembly shows lots of collisions between itself and its dependencies, and even other project dependencies. I have a TODO to examine whether that is going to be a deal-breaker or not.
      
      In the meantime I'm going to tack on a second commit to this PR that will also fix some similar, last collisions in the YARN profile.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #744 from srowen/SPARK-1802 and squashes the following commits:
      
      a856604 [Sean Owen] Resolve JAR version conflicts specific to Hive profile
      8586bf56
    • Sean Owen's avatar
      SPARK-1798. Tests should clean up temp files · 7120a297
      Sean Owen authored
      Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent.
      
      Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former.
      
      The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules.
      
      Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method.
      
      _If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #732 from srowen/SPARK-1798 and squashes the following commits:
      
      5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each
      b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean
      bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests
      7120a297
    • Patrick Wendell's avatar
    • Ankur Dave's avatar
      SPARK-1786: Reopening PR 724 · 0e2bde20
      Ankur Dave authored
      Addressing issue in MimaBuild.scala.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>
      
      Closes #742 from jegonzal/edge_partition_serialization and squashes the following commits:
      
      8ba6e0d [Ankur Dave] Add concatenation operators to MimaBuild.scala
      cb2ed3a [Joseph E. Gonzalez] addressing missing exclusion in MimaBuild.scala
      5d27824 [Ankur Dave] Disable reference tracking to fix serialization test
      c0a9ae5 [Ankur Dave] Add failing test for EdgePartition Kryo serialization
      a4a3faa [Joseph E. Gonzalez] Making EdgePartition serializable.
      0e2bde20
    • Bernardo Gomez Palacio's avatar
      SPARK-1806: Upgrade Mesos dependency to 0.18.1 · d9c97ba3
      Bernardo Gomez Palacio authored
      Enabled Mesos (0.18.1) dependency with shaded protobuf
      
      Why is this needed?
      Avoids any protobuf version collision between Mesos and any other
      dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4.
      
      Ticket: https://issues.apache.org/jira/browse/SPARK-1806
      
      * Should close https://issues.apache.org/jira/browse/SPARK-1433
      
      Author berngp
      
      Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>
      
      Closes #741 from berngp/feature/SPARK-1806 and squashes the following commits:
      
      5d70646 [Bernardo Gomez Palacio] SPARK-1806: Upgrade Mesos dependency to 0.18.1
      d9c97ba3
    • Aaron Davidson's avatar
      SPARK-1772 Stop catching Throwable, let Executors die · 3af1f386
      Aaron Davidson authored
      The main issue this patch fixes is [SPARK-1772](https://issues.apache.org/jira/browse/SPARK-1772), in which Executors may not die when fatal exceptions (e.g., OOM) are thrown. This patch causes Executors to delegate to the ExecutorUncaughtExceptionHandler when a fatal exception is thrown.
      
      This patch also continues the fight in the neverending war against `case t: Throwable =>`, by only catching Exceptions in many places, and adding a wrapper for Threads and Runnables to make sure any uncaught exceptions are at least printed to the logs.
      
      It also turns out that it is unlikely that the IndestructibleActorSystem actually works, given testing ([here](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d)). The uncaughtExceptionHandler is not called from the places that we expected it would be.
      [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620) deals with part of this issue, but refactoring our Actor Systems to ensure that exceptions are dealt with properly is a much bigger change, outside the scope of this PR.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #715 from aarondav/throwable and squashes the following commits:
      
      f9b9bfe [Aaron Davidson] Remove other redundant 'throw e'
      e937a0a [Aaron Davidson] Address Prashant and Matei's comments
      1867867 [Aaron Davidson] [RFC] SPARK-1772 Stop catching Throwable, let Executors die
      3af1f386
    • Patrick Wendell's avatar
      Revert "SPARK-1786: Edge Partition Serialization" · af15c82b
      Patrick Wendell authored
      This reverts commit a6b02fb7.
      af15c82b
  2. May 11, 2014
    • Ankur Dave's avatar
      SPARK-1786: Edge Partition Serialization · a6b02fb7
      Ankur Dave authored
      This appears to address the issue with edge partition serialization.  The solution appears to be just registering the `PrimitiveKeyOpenHashMap`.  However I noticed that we appear to have forked that code in GraphX but retained the same name (which is confusing).  I also renamed our local copy to `GraphXPrimitiveKeyOpenHashMap`.  We should consider dropping that and using the one in Spark if possible.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>
      
      Closes #724 from jegonzal/edge_partition_serialization and squashes the following commits:
      
      b0a525a [Ankur Dave] Disable reference tracking to fix serialization test
      bb7f548 [Ankur Dave] Add failing test for EdgePartition Kryo serialization
      67dac22 [Joseph E. Gonzalez] Making EdgePartition serializable.
      a6b02fb7
    • Joseph E. Gonzalez's avatar
      Fix error in 2d Graph Partitioner · f938a155
      Joseph E. Gonzalez authored
      Their was a minor bug in which negative partition ids could be generated when constructing a 2D partitioning of a graph.  This could lead to an inefficient 2D partition for large vertex id values.
      
      Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>
      
      Closes #709 from jegonzal/fix_2d_partitioning and squashes the following commits:
      
      937c562 [Joseph E. Gonzalez] fixing bug in 2d partitioning algorithm where negative partition ids could be generated.
      f938a155
    • Patrick Wendell's avatar
      SPARK-1652: Set driver memory correctly in spark-submit. · 05c9aa9e
      Patrick Wendell authored
      The previous check didn't account for the fact that the default
      deploy mode is "client" unless otherwise specified. Also, this
      sets the more narrowly defined SPARK_DRIVER_MEMORY instead of setting
      SPARK_MEM.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #730 from pwendell/spark-submit and squashes the following commits:
      
      430b98f [Patrick Wendell] Feedback from Aaron
      e788edf [Patrick Wendell] Changes based on Aaron's feedback
      f508146 [Patrick Wendell] SPARK-1652: Set driver memory correctly in spark-submit.
      05c9aa9e
    • Patrick Wendell's avatar
      SPARK-1770: Load balance elements when repartitioning. · 7d9cc921
      Patrick Wendell authored
      This patch adds better balancing when performing a repartition of an
      RDD. Previously the elements in the RDD were hash partitioned, meaning
      if the RDD was skewed certain partitions would end up being very large.
      
      This commit adds load balancing of elements across the repartitioned
      RDD splits. The load balancing is not perfect: a given output partition
      can have up to N more elements than the average if there are N input
      partitions. However, some randomization is used to minimize the
      probabiliy that this happens.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #727 from pwendell/load-balance and squashes the following commits:
      
      f9da752 [Patrick Wendell] Response to Matei's feedback
      acfa46a [Patrick Wendell] SPARK-1770: Load balance elements when repartitioning.
      7d9cc921
    • witgo's avatar
      remove outdated runtime Information scala home · 6bee01dd
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #728 from witgo/scala_home and squashes the following commits:
      
      cdfd8be [witgo] Merge branch 'master' of https://github.com/apache/spark into scala_home
      fac094a [witgo] remove outdated runtime Information scala home
      6bee01dd
  3. May 10, 2014
    • Prashant Sharma's avatar
      Enabled incremental build that comes with sbt 0.13.2 · 70bcdef4
      Prashant Sharma authored
      More info at. https://github.com/sbt/sbt/issues/1010
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #525 from ScrapCodes/sbt-inc-opt and squashes the following commits:
      
      ba8fa42 [Prashant Sharma] Enabled incremental build that comes with sbt 0.13.2
      70bcdef4
    • Andrew Or's avatar
      [SPARK-1774] Respect SparkSubmit --jars on YARN (client) · 83e0424d
      Andrew Or authored
      SparkSubmit ignores `--jars` for YARN client. This is a bug.
      
      This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.
      
      Tested on a YARN cluster.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #710 from andrewor14/yarn-jars and squashes the following commits:
      
      35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
      c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
      c92c5bf [Andrew Or] Minor cleanups
      269f9f3 [Andrew Or] Fix format
      013d840 [Andrew Or] Fix tests
      1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
      3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
      83e0424d
    • Sean Owen's avatar
      SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure · 2b7bd29e
      Sean Owen authored
      TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure.
      
      I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?)
      
      velvia notes:
      "I have found a workaround.  If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty."
      
      There are at least 3 versions of Netty in play in the build:
      
      - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem
      - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6.
      - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final
      
      The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue.
      
      The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final.
      
      But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile.
      
      If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation.
      
      So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict:
      
      - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts
      - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty
      - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent
      - Update SBT build accordingly
      
      A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #723 from srowen/SPARK-1789 and squashes the following commits:
      
      43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues
      2b7bd29e
    • Ankur Dave's avatar
      Unify GraphImpl RDDs + other graph load optimizations · 905173df
      Ankur Dave authored
      This PR makes the following changes, primarily in e4fbd329aef85fe2c38b0167255d2a712893d683:
      
      1. *Unify RDDs to avoid zipPartitions.* A graph used to be four RDDs: vertices, edges, routing table, and triplet view. This commit merges them down to two: vertices (with routing table), and edges (with replicated vertices).
      
      2. *Avoid duplicate shuffle in graph building.* We used to do two shuffles when building a graph: one to extract routing information from the edges and move it to the vertices, and another to find nonexistent vertices referred to by edges. With this commit, the latter is done as a side effect of the former.
      
      3. *Avoid no-op shuffle when joins are fully eliminated.* This is a side effect of unifying the edges and the triplet view.
      
      4. *Join elimination for mapTriplets.*
      
      5. *Ship only the needed vertex attributes when upgrading the triplet view.* If the triplet view already contains source attributes, and we now need both attributes, only ship destination attributes rather than re-shipping both. This is done in `ReplicatedVertexView#upgrade`.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #497 from ankurdave/unify-rdds and squashes the following commits:
      
      332ab43 [Ankur Dave] Merge remote-tracking branch 'apache-spark/master' into unify-rdds
      4933e2e [Ankur Dave] Exclude RoutingTable from binary compatibility check
      5ba8789 [Ankur Dave] Add GraphX upgrade guide from Spark 0.9.1
      13ac845 [Ankur Dave] Merge remote-tracking branch 'apache-spark/master' into unify-rdds
      a04765c [Ankur Dave] Remove unnecessary toOps call
      57202e8 [Ankur Dave] Replace case with pair parameter
      75af062 [Ankur Dave] Add explicit return types
      04d3ae5 [Ankur Dave] Convert implicit parameter to context bound
      c88b269 [Ankur Dave] Revert upgradeIterator to if-in-a-loop
      0d3584c [Ankur Dave] EdgePartition.size should be val
      2a928b2 [Ankur Dave] Set locality wait
      10b3596 [Ankur Dave] Clean up public API
      ae36110 [Ankur Dave] Fix style errors
      e4fbd32 [Ankur Dave] Unify GraphImpl RDDs + other graph load optimizations
      d6d60e2 [Ankur Dave] In GraphLoader, coalesce to minEdgePartitions
      62c7b78 [Ankur Dave] In Analytics, take PageRank numIter
      d64e8d4 [Ankur Dave] Log current Pregel iteration
      905173df
    • Kan Zhang's avatar
      [SPARK-1690] Tolerating empty elements when saving Python RDD to text files · 6c2691d0
      Kan Zhang authored
      Tolerate empty strings in PythonRDD
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #644 from kanzhang/SPARK-1690 and squashes the following commits:
      
      c62ad33 [Kan Zhang] Adding Python doctest
      473ec4b [Kan Zhang] [SPARK-1690] Tolerating empty elements when saving Python RDD to text files
      6c2691d0
    • Bouke van der Bijl's avatar
      Add Python includes to path before depickling broadcast values · 3776f2f2
      Bouke van der Bijl authored
      This fixes https://issues.apache.org/jira/browse/SPARK-1731 by adding the Python includes to the PYTHONPATH before depickling the broadcast values
      
      @airhorns
      
      Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
      
      Closes #656 from bouk/python-includes-before-broadcast and squashes the following commits:
      
      7b0dfe4 [Bouke van der Bijl] Add Python includes to path before depickling broadcast values
      3776f2f2
    • Andy Konwinski's avatar
      fix broken in link in python docs · c05d11bb
      Andy Konwinski authored
      Author: Andy Konwinski <andykonwinski@gmail.com>
      
      Closes #650 from andyk/python-docs-link-fix and squashes the following commits:
      
      a1f9d51 [Andy Konwinski] fix broken in link in python docs
      c05d11bb
    • Matei Zaharia's avatar
      SPARK-1708. Add a ClassTag on Serializer and things that depend on it · 7eefc9d2
      Matei Zaharia authored
      This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer and types that depend on it (Broadcast and AccumulableCollection). Putting these in the public API signatures now will allow us to use Scala Pickling for serialization down the line without breaking binary compatibility.
      
      One question remaining is whether we also want them on Accumulator -- Accumulator is passed as part of a bigger Task or TaskResult object via the closure serializer so it doesn't seem super useful to add the ClassTag there. Broadcast and AccumulableCollection in contrast were being serialized directly.
      
      CC @rxin, @pwendell, @heathermiller
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #700 from mateiz/spark-1708 and squashes the following commits:
      
      1a3d8b0 [Matei Zaharia] Use fake ClassTag in Java
      3b449ed [Matei Zaharia] test fix
      2209a27 [Matei Zaharia] Code style fixes
      9d48830 [Matei Zaharia] Add a ClassTag on Serializer and things that depend on it
      7eefc9d2
    • Takuya UESHIN's avatar
      [SPARK-1778] [SQL] Add 'limit' transformation to SchemaRDD. · 8e94d272
      Takuya UESHIN authored
      Add `limit` transformation to `SchemaRDD`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #711 from ueshin/issues/SPARK-1778 and squashes the following commits:
      
      33169df [Takuya UESHIN] Add 'limit' transformation to SchemaRDD.
      8e94d272
    • Michael Armbrust's avatar
      [SQL] Upgrade parquet library. · 4d605532
      Michael Armbrust authored
      I think we are hitting this issue in some perf tests: https://github.com/Parquet/parquet-mr/commit/6aed5288fd4a1398063a5a219b2ae4a9f71b02cf
      
      Credit to @aarondav !
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #684 from marmbrus/upgradeParquet and squashes the following commits:
      
      e10a619 [Michael Armbrust] Upgrade parquet library.
      4d605532
    • witgo's avatar
      [SPARK-1644] The org.datanucleus:* should not be packaged into spark-assembly-*.jar · 56151086
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #688 from witgo/SPARK-1644 and squashes the following commits:
      
      56ad6ac [witgo] review commit
      87c03e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1644
      6ffa7e4 [witgo] review commit
      a597414 [witgo] The org.datanucleus:* should not be packaged into spark-assembly-*.jar
      56151086
  4. May 09, 2014
    • CodingCat's avatar
      SPARK-1686: keep schedule() calling in the main thread · 2f452cba
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-1686
      
      moved from original JIRA (by @markhamstra):
      
      In deploy.master.Master, the completeRecovery method is the last thing to be called when a standalone Master is recovering from failure. It is responsible for resetting some state, relaunching drivers, and eventually resuming its scheduling duties.
      
      There are currently four places in Master.scala where completeRecovery is called. Three of them are from within the actor's receive method, and aren't problems. The last starts from within receive when the ElectedLeader message is received, but the actual completeRecovery() call is made from the Akka scheduler. That means that it will execute on a different scheduler thread, and Master itself will end up running (i.e., schedule() ) from that Akka scheduler thread.
      
      In this PR, I added a new master message TriggerSchedule to trigger the "local" call of schedule() in the scheduler thread
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #639 from CodingCat/SPARK-1686 and squashes the following commits:
      
      81bb4ca [CodingCat] rename variable
      69e0a2a [CodingCat] style fix
      36a2ac0 [CodingCat] address Aaron's comments
      ec9b7bb [CodingCat] address the comments
      02b37ca [CodingCat] keep schedule() calling in the main thread
      2f452cba
    • Aaron Davidson's avatar
      SPARK-1770: Revert accidental(?) fix · 59577df1
      Aaron Davidson authored
      Looks like this change was accidentally committed here: https://github.com/apache/spark/commit/06b15baab25951d124bbe6b64906f4139e037deb
      but the change does not show up in the PR itself (#704).
      
      Other than not intending to go in with that PR, this also broke the test JavaAPISuite.repartition.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #716 from aarondav/shufflerand and squashes the following commits:
      
      b1cf70b [Aaron Davidson] SPARK-1770: Revert accidental(?) fix
      59577df1
    • witgo's avatar
      [SPARK-1760]: fix building spark with maven documentation · bd67551e
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #712 from witgo/building-with-maven and squashes the following commits:
      
      215523b [witgo] fix building spark with maven documentation
      bd67551e
    • Tathagata Das's avatar
      Converted bang to ask to avoid scary warning when a block is removed · 32868f31
      Tathagata Das authored
      Removing a block through the blockmanager gave a scary warning messages in the driver.
      ```
      2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true
      2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true
      2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true
      ```
      
      This is because the [BlockManagerSlaveActor](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala#L44) would send back an acknowledgement ("true"). But the BlockManagerMasterActor would have sent the RemoveBlock message as a send, not as ask(), so would reject the receiver "true" as a unknown message.
      @pwendell
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #708 from tdas/bm-fix and squashes the following commits:
      
      ed4ef15 [Tathagata Das] Converted bang to ask to avoid scary warning when a block is removed.
      32868f31
    • Patrick Wendell's avatar
      MINOR: Removing dead code. · 4c60fd1e
      Patrick Wendell authored
      Meant to do this when patching up the last merge.
      4c60fd1e
    • Sandeep's avatar
      SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo · 7db47c46
      Sandeep authored
      This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.
      
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #707 from techaddict/SPARK-1775 and squashes the following commits:
      
      18d8ebf [Sandeep] SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.
      7db47c46
    • Patrick Wendell's avatar
      SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. · 06b15baa
      Patrick Wendell authored
      Gives a nicely formatted message to the user when `run-example` is run to
      tell them to use `spark-submit`.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #704 from pwendell/examples and squashes the following commits:
      
      1996ee8 [Patrick Wendell] Feedback form Andrew
      3eb7803 [Patrick Wendell] Suggestions from TD
      2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
      06b15baa
  5. May 08, 2014
    • Marcelo Vanzin's avatar
      [SPARK-1631] Correctly set the Yarn app name when launching the AM. · 3f779d87
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #539 from vanzin/yarn-app-name and squashes the following commits:
      
      7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
      3f779d87
    • Andrew Or's avatar
      [SPARK-1755] Respect SparkSubmit --name on YARN · 8b784129
      Andrew Or authored
      Right now, SparkSubmit ignores the `--name` flag for both yarn-client and yarn-cluster. This is a bug.
      
      In client mode, SparkSubmit treats `--name` as a [cluster config](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170) and does not propagate this to SparkContext.
      
      In cluster mode, SparkSubmit passes this flag to `org.apache.spark.deploy.yarn.Client`, which only uses it for the [YARN ResourceManager](https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L80), but does not propagate this to SparkContext.
      
      This PR ensures that `spark.app.name` is always set if SparkSubmit receives the `--name` flag, which is what the usage promises. This makes it possible for applications to start a SparkContext with an empty conf `val sc = new SparkContext(new SparkConf)`, and inherit the app name from SparkSubmit.
      
      Tested both modes on a YARN cluster.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #699 from andrewor14/yarn-app-name and squashes the following commits:
      
      98f6a79 [Andrew Or] Fix tests
      dea932f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-app-name
      c86d9ca [Andrew Or] Respect SparkSubmit --name on YARN
      8b784129
    • Bouke van der Bijl's avatar
      Include the sbin/spark-config.sh in spark-executor · 2fd2752e
      Bouke van der Bijl authored
      This is needed because broadcast values are broken on pyspark on Mesos, it tries to import pyspark but can't, as the PYTHONPATH is not set due to changes in ff5be9a4
      
      https://issues.apache.org/jira/browse/SPARK-1725
      
      Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
      
      Closes #651 from bouk/include-spark-config-in-mesos-executor and squashes the following commits:
      
      b2f1295 [Bouke van der Bijl] Inline PYTHONPATH in spark-executor
      eedbbcc [Bouke van der Bijl] Include the sbin/spark-config.sh in spark-executor
      2fd2752e
    • Funes's avatar
      Bug fix of sparse vector conversion · 191279ce
      Funes authored
      Fixed a small bug caused by the inconsistency of index/data array size and vector length.
      
      Author: Funes <tianshaocun@gmail.com>
      Author: funes <tianshaocun@gmail.com>
      
      Closes #661 from funes/bugfix and squashes the following commits:
      
      edb2b9d [funes] remove unused import
      75dced3 [Funes] update test case
      d129a66 [Funes] Add test for sparse breeze by vector builder
      64e7198 [Funes] Copy data only when necessary
      b85806c [Funes] Bug fix of sparse vector conversion
      191279ce
    • DB Tsai's avatar
      [SPARK-1157][MLlib] Bug fix: lossHistory should exclude rejection steps, and remove miniBatch · 910a13b3
      DB Tsai authored
      Getting the lossHistory from Breeze's API which already excludes the rejection steps in line search. Also, remove the miniBatch in LBFGS since those quasi-Newton methods approximate the inverse of Hessian. It doesn't make sense if the gradients are computed from a varying objective.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #582 from dbtsai/dbtsai-lbfgs-bug and squashes the following commits:
      
      9cc6cf9 [DB Tsai] Removed the miniBatch in LBFGS.
      1ba6a33 [DB Tsai] Formatting the code.
      d72c679 [DB Tsai] Using Breeze's states to get the loss.
      910a13b3
    • DB Tsai's avatar
      MLlib documentation fix · d38febee
      DB Tsai authored
      Fixed the documentation for that `loadLibSVMData` is changed to `loadLibSVMFile`.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #703 from dbtsai/dbtsai-docfix and squashes the following commits:
      
      71dd508 [DB Tsai] loadLibSVMData is changed to loadLibSVMFile
      d38febee
Loading