Skip to content
Snippets Groups Projects
  1. Dec 08, 2016
  2. Nov 28, 2016
  3. Sep 02, 2016
    • Thomas Graves's avatar
      [SPARK-16711] YarnShuffleService doesn't re-init properly on YARN rolling upgrade · e79962f2
      Thomas Graves authored
      The Spark Yarn Shuffle Service doesn't re-initialize the application credentials early enough which causes any other spark executors trying to fetch from that node during a rolling upgrade to fail with "java.lang.NullPointerException: Password cannot be null if SASL is enabled".  Right now the spark shuffle service relies on the Yarn nodemanager to re-register the applications, unfortunately this is after we open the port for other executors to connect. If other executors connected before the re-register they get a null pointer exception which isn't a re-tryable exception and cause them to fail pretty quickly. To solve this I added another leveldb file so that it can save and re-initialize all the applications before opening the port for other executors to connect to it.  Adding another leveldb was simpler from the code structure point of view.
      
      Most of the code changes are moving things to common util class.
      
      Patch was tested manually on a Yarn cluster with rolling upgrade was happing while spark job was running. Without the patch I consistently get the NullPointerException, with the patch the job gets a few Connection refused exceptions but the retries kick in and the it succeeds.
      
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #14718 from tgravescs/SPARK-16711.
      e79962f2
  4. Jul 19, 2016
  5. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  6. May 18, 2016
    • Tejas Patil's avatar
      [SPARK-15263][CORE] Make shuffle service dir cleanup faster by using `rm -rf` · c1fd9cac
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      Jira: https://issues.apache.org/jira/browse/SPARK-15263
      
      The current logic for directory cleanup is slow because it does directory listing, recurses over child directories, checks for symbolic links, deletes leaf files and finally deletes the dirs when they are empty. There is back-and-forth switching from kernel space to user space while doing this. Since most of the deployment backends would be Unix systems, we could essentially just do `rm -rf` so that entire deletion logic runs in kernel space.
      
      The current Java based impl in Spark seems to be similar to what standard libraries like guava and commons IO do (eg. http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/apache/commons/io/FileUtils.java?view=markup#l1540). However, guava removed this method in favour of shelling out to an operating system command (like in this PR). See the `Deprecated` note in older javadocs for guava for details : http://google.github.io/guava/releases/10.0.1/api/docs/com/google/common/io/Files.html#deleteRecursively(java.io.File)
      
      Ideally, Java should be providing such APIs so that users won't have to do such things to get platform specific code. Also, its not just about speed, but also handling race conditions while doing at FS deletions is tricky. I could find this bug for Java in similar context : http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7148952
      
      ## How was this patch tested?
      
      I am relying on existing test cases to test the method. If there are suggestions about testing it, welcome to hear about it.
      
      ## Performance gains
      
      *Input setup* : Created a nested directory structure of depth 3 and each entry having 50 sub-dirs. The input being cleaned up had total ~125k dirs.
      
      Ran both approaches (in isolation) for 6 times to get average numbers:
      
      Native Java cleanup  | `rm -rf` as a separate process
      ------------ | -------------
      10.04 sec | 4.11 sec
      
      This change made deletion 2.4 times faster for the given test input.
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #13042 from tejasapatil/delete_recursive.
      c1fd9cac
  7. May 17, 2016
  8. Apr 28, 2016
  9. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  10. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  11. Jan 10, 2016
  12. Jan 08, 2016
  13. Dec 19, 2015
  14. Oct 07, 2015
  15. Sep 15, 2015
  16. Sep 02, 2015
    • Marcelo Vanzin's avatar
      [SPARK-10004] [SHUFFLE] Perform auth checks when clients read shuffle data. · 2da3a9e9
      Marcelo Vanzin authored
      To correctly isolate applications, when requests to read shuffle data
      arrive at the shuffle service, proper authorization checks need to
      be performed. This change makes sure that only the application that
      created the shuffle data can read from it.
      
      Such checks are only enabled when "spark.authenticate" is enabled,
      otherwise there's no secure way to make sure that the client is really
      who it says it is.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8218 from vanzin/SPARK-10004.
      2da3a9e9
  17. Jun 28, 2015
    • Josh Rosen's avatar
      [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451
      Josh Rosen authored
      Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
      
      See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
      
      70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
      f5100451
  18. Jun 03, 2015
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  19. Apr 28, 2015
    • Sean Owen's avatar
      [SPARK-7168] [BUILD] Update plugin versions in Maven build and centralize versions · 7f3b3b7e
      Sean Owen authored
      Update Maven build plugin versions and centralize plugin version management
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5720 from srowen/SPARK-7168 and squashes the following commits:
      
      98a8947 [Sean Owen] Make install, deploy plugin versions explicit
      4ecf3b2 [Sean Owen] Update Maven build plugin versions and centralize plugin version management
      7f3b3b7e
  20. Apr 01, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6578] [core] Fix thread-safety issue in outbound path of network library. · f084c5de
      Marcelo Vanzin authored
      While the inbound path of a netty pipeline is thread-safe, the outbound
      path is not. That means that multiple threads can compete to write messages
      to the next stage of the pipeline.
      
      The network library sometimes breaks a single RPC message into multiple
      buffers internally to avoid copying data (see MessageEncoder). This can
      result in the following scenario (where "FxBy" means "frame x, buffer y"):
      
                     T1         F1B1            F1B2
                                  \               \
                                   \               \
                     socket        F1B1   F2B1    F1B2  F2B2
                                           /             /
                                          /             /
                     T2                  F2B1         F2B2
      
      And the frames now cannot be rebuilt on the receiving side because the
      different messages have been mixed up on the wire.
      
      The fix wraps these multi-buffer messages into a `FileRegion` object
      so that these messages are written "atomically" to the next pipeline handler.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5234 from vanzin/SPARK-6578 and squashes the following commits:
      
      16b2d70 [Marcelo Vanzin] Forgot to update a type.
      c9c2e4e [Marcelo Vanzin] Review comments: simplify some code.
      9c888ac [Marcelo Vanzin] Small style nits.
      8474bab [Marcelo Vanzin] Fix multiple calls to MessageWithHeader.transferTo().
      e26509f [Marcelo Vanzin] Merge branch 'master' into SPARK-6578
      c503f6c [Marcelo Vanzin] Implement a custom FileRegion instead of using locks.
      84aa7ce [Marcelo Vanzin] Rename handler to the correct name.
      432f3bd [Marcelo Vanzin] Remove unneeded method.
      8d70e60 [Marcelo Vanzin] Fix thread-safety issue in outbound path of network library.
      f084c5de
  21. Mar 20, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. · a7456459
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5056 from vanzin/SPARK-6371 and squashes the following commits:
      
      63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371
      6506f75 [Marcelo Vanzin] Use more fine-grained exclusion.
      178ba71 [Marcelo Vanzin] Oops.
      75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA.
      a45a62c [Marcelo Vanzin] Work around MIMA warning.
      1d8a670 [Marcelo Vanzin] Re-group jetty exclusion.
      0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx.
      cef4603 [Marcelo Vanzin] Indentation.
      296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
      a7456459
  22. Mar 05, 2015
  23. Feb 01, 2015
    • Patrick Wendell's avatar
      [SPARK-3996]: Shade Jetty in Spark deliverables · a15f6e31
      Patrick Wendell authored
      (v2 of this patch with a fix that was only relevant for the maven build).
      
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the <artifactSet>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4285 from pwendell/jetty and squashes the following commits:
      
      d3e7f4e [Patrick Wendell] Fix for shaded deps causing compile errors
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      a15f6e31
  24. Jan 29, 2015
    • Patrick Wendell's avatar
      Revert "[WIP] [SPARK-3996]: Shade Jetty in Spark deliverables" · d2071e8f
      Patrick Wendell authored
      This reverts commit f240fe39.
      d2071e8f
    • Patrick Wendell's avatar
      [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables · f240fe39
      Patrick Wendell authored
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the \<artifactSet\>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4252 from pwendell/jetty and squashes the following commits:
      
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      f240fe39
  25. Jan 28, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4809] Rework Guava library shading. · 37a5e272
      Marcelo Vanzin authored
      The current way of shading Guava is a little problematic. Code that
      depends on "spark-core" does not see the transitive dependency, yet
      classes in "spark-core" actually depend on Guava. So it's a little
      tricky to run unit tests that use spark-core classes, since you need
      a compatible version of Guava in your dependencies when running the
      tests. This can become a little tricky, and is kind of a bad user
      experience.
      
      This change modifies the way Guava is shaded so that it's applied
      uniformly across the Spark build. This means Guava is shaded inside
      spark-core itself, so that the dependency issues above are solved.
      Aside from that, all Spark sub-modules have their Guava references
      relocated, so that they refer to the relocated classes now packaged
      inside spark-core. Before, this was only done by the time the assembly
      was built, so projects that did not end up inside the assembly (such
      as streaming backends) could still reference the original location
      of Guava classes.
      
      The Guava classes are added to the "first" artifact Spark generates
      (network-common), so that all downstream modules have the needed
      classes available. Since "network-common" is a dependency of spark-core,
      all Spark apps should get the relocated classes automatically.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3658 from vanzin/SPARK-4809 and squashes the following commits:
      
      3c93e42 [Marcelo Vanzin] Shade Guava in the network-common artifact.
      5d69ec9 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      b3104fc [Marcelo Vanzin] Add comment.
      941848f [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      f78c48a [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      8053dd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      107d7da [Marcelo Vanzin] Add fix for SPARK-5052 (PR #3874).
      40b8723 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      4a4ed42 [Marcelo Vanzin] [SPARK-4809] Rework Guava library shading.
      37a5e272
  26. Jan 06, 2015
    • Sean Owen's avatar
      SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4
      Sean Owen authored
      This PR:
      
      - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
      - Tells `surefire` to test only Java tests
      - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
      
      For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
      
      2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
      12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
      e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
      4cba6eb4
  27. Nov 18, 2014
    • Marcelo Vanzin's avatar
      Bumping version to 1.3.0-SNAPSHOT. · 397d3aae
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3277 from vanzin/version-1.3 and squashes the following commits:
      
      7c3c396 [Marcelo Vanzin] Added temp repo to sbt build.
      5f404ff [Marcelo Vanzin] Add another exclusion.
      19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo.
      3c8d705 [Marcelo Vanzin] Workaround for MIMA checks.
      e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.
      397d3aae
  28. Nov 13, 2014
    • Xiangrui Meng's avatar
      [SPARK-4326] fix unidoc · 4b0c1edf
      Xiangrui Meng authored
      There are two issues:
      
      1. specifying guava 11.0.2 will cause hashInt not found in unidoc (any reason to force the version here?)
      2. unidoc doesn't recognize static class defined in a base class
      
      aarondav srowen vanzin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #3253 from mengxr/SPARK-4326 and squashes the following commits:
      
      53967bf [Xiangrui Meng] fix unidoc
      4b0c1edf
  29. Nov 12, 2014
    • Andrew Or's avatar
      [SPARK-4281][Build] Package Yarn shuffle service into its own jar · aa43a8da
      Andrew Or authored
      This is another addendum to #3082, which added the Yarn shuffle service to run inside the NM. This PR makes the feature much more usable by packaging enough dependencies into the jar to run the service inside an NM. After these changes, the user can run `./make-distribution.sh` and find a `spark-network-yarn*.jar` in their `lib` directory. The equivalent change is done in SBT by making the `network-yarn` module an assembly project.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #3147 from andrewor14/yarn-shuffle-build and squashes the following commits:
      
      bda58d0 [Andrew Or] Fix line too long
      81e9705 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-build
      fb7f398 [Andrew Or] Rename jar to spark-{VERSION}-yarn-shuffle.jar
      65db822 [Andrew Or] Actually mark slf4j as provided
      abcefd1 [Andrew Or] Do the same for SBT
      c653028 [Andrew Or] Package network-yarn and its dependencies
      aa43a8da
  30. Nov 08, 2014
    • Andrew Or's avatar
      [SPARK-4291][Build] Rename network module projects · 7afc8564
      Andrew Or authored
      The names of the recently introduced network modules are inconsistent with those of the other modules in the project. We should just drop the "Code" suffix since it doesn't sacrifice any meaning, especially before they get into an official release.
      
      ```
      [INFO] Reactor Build Order:
      [INFO]
      [INFO] Spark Project Parent POM
      [INFO] Spark Project Common Network Code
      [INFO] Spark Project Shuffle Streaming Service Code
      [INFO] Spark Project Core
      [INFO] Spark Project Bagel
      [INFO] Spark Project GraphX
      [INFO] Spark Project Streaming
      [INFO] Spark Project Catalyst
      [INFO] Spark Project SQL
      [INFO] Spark Project ML Library
      [INFO] Spark Project Tools
      [INFO] Spark Project Hive
      [INFO] Spark Project REPL
      [INFO] Spark Project YARN Parent POM
      [INFO] Spark Project YARN Stable API
      [INFO] Spark Project Assembly
      [INFO] Spark Project External Twitter
      [INFO] Spark Project External Kafka
      [INFO] Spark Project External Flume Sink
      [INFO] Spark Project External Flume
      [INFO] Spark Project External ZeroMQ
      [INFO] Spark Project External MQTT
      [INFO] Spark Project Examples
      [INFO] Spark Project Yarn Shuffle Service Code
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #3148 from andrewor14/build-drop-code and squashes the following commits:
      
      eac839b [Andrew Or] Network -> Networking
      d01ad47 [Andrew Or] Rename network module project names
      7afc8564
  31. Nov 05, 2014
    • Aaron Davidson's avatar
      [SPARK-4242] [Core] Add SASL to external shuffle service · 4c42986c
      Aaron Davidson authored
      Does three things: (1) Adds SASL to ExternalShuffleClient, (2) puts SecurityManager in BlockManager's constructor, and (3) adds unit test.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3108 from aarondav/sasl-client and squashes the following commits:
      
      48b622d [Aaron Davidson] Screw it, let's just get LimitedInputStream
      3543b70 [Aaron Davidson] Back out of pom change due to unknown test issue?
      b58518a [Aaron Davidson] ByteStreams.limit() not available :(
      cbe451a [Aaron Davidson] Address comments
      2bf2908 [Aaron Davidson] [SPARK-4242] [Core] Add SASL to external shuffle service
      4c42986c
  32. Nov 01, 2014
    • Aaron Davidson's avatar
      [SPARK-3796] Create external service which can serve shuffle files · f55218ae
      Aaron Davidson authored
      This patch introduces the tooling necessary to construct an external shuffle service which is independent of Spark executors, and then use this service inside Spark. An example (just for the sake of this PR) of the service creation can be found in Worker, and the service itself is used by plugging in the StandaloneShuffleClient as Spark's ShuffleClient (setup in BlockManager).
      
      This PR continues the work from #2753, which extracted out the transport layer of Spark's block transfer into an independent package within Spark. A new package was created which contains the Spark business logic necessary to retrieve the actual shuffle data, which is completely independent of the transport layer introduced in the previous patch. Similar to the transport layer, this package must not depend on Spark as we anticipate plugging this service as a lightweight process within, say, the YARN NodeManager, and do not wish to include Spark's dependencies (including Scala itself).
      
      There are several outstanding tasks which must be complete before this PR can be merged:
      - [x] Complete unit testing of network/shuffle package.
      - [x] Performance and correctness testing on a real cluster.
      - [x] Remove example service instantiation from Worker.scala.
      
      There are even more shortcomings of this PR which should be addressed in followup patches:
      - Don't use Java serializer for RPC layer! It is not cross-version compatible.
      - Handle shuffle file cleanup for dead executors once the application terminates or the ContextCleaner triggers.
      - Documentation of the feature in the Spark docs.
      - Improve behavior if the shuffle service itself goes down (right now we don't blacklist it, and new executors cannot spawn on that machine).
      - SSL and SASL integration
      - Nice to have: Handle shuffle file consolidation (this would requires changes to Spark's implementation).
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3001 from aarondav/shuffle-service and squashes the following commits:
      
      4d1f8c1 [Aaron Davidson] Remove changes to Worker
      705748f [Aaron Davidson] Rename Standalone* to External*
      fd3928b [Aaron Davidson] Do not unregister executor outputs unduly
      9883918 [Aaron Davidson] Make suggested build changes
      3d62679 [Aaron Davidson] Add Spark integration test
      7fe51d5 [Aaron Davidson] Fix SBT integration
      56caa50 [Aaron Davidson] Address comments
      c8d1ac3 [Aaron Davidson] Add unit tests
      2f70c0c [Aaron Davidson] Fix unit tests
      5483e96 [Aaron Davidson] Fix unit tests
      46a70bf [Aaron Davidson] Whoops, bracket
      5ea4df6 [Aaron Davidson] [SPARK-3796] Create external service which can serve shuffle files
      f55218ae
  33. Oct 30, 2014
    • Patrick Wendell's avatar
      HOTFIX: Clean up build in network module. · 0734d093
      Patrick Wendell authored
      This is currently breaking the package build for some people (including me).
      
      This patch does some general clean-up which also fixes the current issue.
      - Uses consistent artifact naming
      - Adds sbt support for this module
      - Changes tests to use scalatest (fixes the original issue[1])
      
      One thing to note, it turns out that scalatest when invoked in the
      Maven build doesn't succesfully detect JUnit Java tests. This is
      a long standing issue, I noticed it applies to all of our current
      test suites as well. I've created SPARK-4159 to fix this.
      
      [1] The original issue is that we need to allocate extra memory
      for the tests, happens by default in our scalatest configuration.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3025 from pwendell/hotfix and squashes the following commits:
      
      faa9053 [Patrick Wendell] HOTFIX: Clean up build in network module.
      0734d093
  34. Oct 29, 2014
    • Reynold Xin's avatar
      [SPARK-3453] Netty-based BlockTransferService, extracted from Spark core · dff01553
      Reynold Xin authored
      This PR encapsulates #2330, which is itself a continuation of #2240. The first goal of this PR is to provide an alternate, simpler implementation of the ConnectionManager which is based on Netty.
      
      In addition to this goal, however, we want to resolve [SPARK-3796](https://issues.apache.org/jira/browse/SPARK-3796), which calls for a standalone shuffle service which can be integrated into the YARN NodeManager, Standalone Worker, or on its own. This PR makes the first step in this direction by ensuring that the actual Netty service is as small as possible and extracted from Spark core. Given this, we should be able to construct this standalone jar which can be included in other JVMs without incurring significant dependency or runtime issues. The actual work to ensure that such a standalone shuffle service would work in Spark will be left for a future PR, however.
      
      In order to minimize dependencies and allow for the service to be long-running (possibly much longer-running than Spark, and possibly having to support multiple version of Spark simultaneously), the entire service has been ported to Java, where we have full control over the binary compatibility of the components and do not depend on the Scala runtime or version.
      
      These issues: have been addressed by folding in #2330:
      
      SPARK-3453: Refactor Netty module to use BlockTransferService interface
      SPARK-3018: Release all buffers upon task completion/failure
      SPARK-3002: Create a connection pool and reuse clients across different threads
      SPARK-3017: Integration tests and unit tests for connection failures
      SPARK-3049: Make sure client doesn't block when server/connection has error(s)
      SPARK-3502: SO_RCVBUF and SO_SNDBUF should be bootstrap childOption, not option
      SPARK-3503: Disable thread local cache in PooledByteBufAllocator
      
      TODO before mergeable:
      - [x] Implement uploadBlock()
      - [x] Unit tests for RPC side of code
      - [x] Performance testing (see comments [here](https://github.com/apache/spark/pull/2753#issuecomment-59475022))
      - [x] Turn OFF by default (currently on for unit testing)
      
      Author: Reynold Xin <rxin@apache.org>
      Author: Aaron Davidson <aaron@databricks.com>
      Author: cocoatomo <cocoatomo77@gmail.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Davies Liu <davies.liu@gmail.com>
      Author: Anand Avati <avati@redhat.com>
      
      Closes #2753 from aarondav/netty and squashes the following commits:
      
      cadfd28 [Aaron Davidson] Turn netty off by default
      d7be11b [Aaron Davidson] Turn netty on by default
      4a204b8 [Aaron Davidson] Fail block fetches if client connection fails
      2b0d1c0 [Aaron Davidson] 100ch
      0c5bca2 [Aaron Davidson] Merge branch 'master' of https://github.com/apache/spark into netty
      14e37f7 [Aaron Davidson] Address Reynold's comments
      8dfcceb [Aaron Davidson] Merge branch 'master' of https://github.com/apache/spark into netty
      322dfc1 [Aaron Davidson] Address Reynold's comments, including major rename
      e5675a4 [Aaron Davidson] Fail outstanding RPCs as well
      ccd4959 [Aaron Davidson] Don't throw exception if client immediately fails
      9da0bc1 [Aaron Davidson] Add RPC unit tests
      d236dfd [Aaron Davidson] Remove no-op serializer :)
      7b7a26c [Aaron Davidson] Fix Nio compile issue
      dd420fd [Aaron Davidson] Merge branch 'master' of https://github.com/apache/spark into netty-test
      939f276 [Aaron Davidson] Attempt to make comm. bidirectional
      aa58f67 [cocoatomo] [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings
      8dc1ded [cocoatomo] [SPARK-3867][PySpark] ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed
      5b5dbe6 [Prashant Sharma] [SPARK-2924] Required by scala 2.11, only one fun/ctor amongst overriden alternatives, can have default argument(s).
      2c5d9dc [Patrick Wendell] HOTFIX: Fix build issue with Akka 2.3.4 upgrade.
      020691e [Davies Liu] [SPARK-3886] [PySpark] use AutoBatchedSerializer by default
      ae4083a [Anand Avati] [SPARK-2805] Upgrade Akka to 2.3.4
      29c6dcf [Aaron Davidson] [SPARK-3453] Netty-based BlockTransferService, extracted from Spark core
      f7e7568 [Reynold Xin] Fixed spark.shuffle.io.receiveBuffer setting.
      5d98ce3 [Reynold Xin] Flip buffer.
      f6c220d [Reynold Xin] Merge with latest master.
      407e59a [Reynold Xin] Fix style violation.
      a0518c7 [Reynold Xin] Implemented block uploads.
      4b18db2 [Reynold Xin] Copy the buffer in fetchBlockSync.
      bec4ea2 [Reynold Xin] Removed OIO and added num threads settings.
      1bdd7ee [Reynold Xin] Fixed tests.
      d68f328 [Reynold Xin] Logging close() in case close() fails.
      f63fb4c [Reynold Xin] Add more debug message.
      6afc435 [Reynold Xin] Added logging.
      c066309 [Reynold Xin] Implement java.io.Closeable interface.
      519d64d [Reynold Xin] Mark private package visibility and MimaExcludes.
      f0a16e9 [Reynold Xin] Fixed test hanging.
      14323a5 [Reynold Xin] Removed BlockManager.getLocalShuffleFromDisk.
      b2f3281 [Reynold Xin] Added connection pooling.
      d23ed7b [Reynold Xin] Incorporated feedback from Norman: - use same pool for boss and worker - remove ioratio - disable caching of byte buf allocator - childoption sendbuf/receivebuf - fire exception through pipeline
      9e0cb87 [Reynold Xin] Fixed BlockClientHandlerSuite
      5cd33d7 [Reynold Xin] Fixed style violation.
      cb589ec [Reynold Xin] Added more test cases covering cleanup when fault happens in ShuffleBlockFetcherIteratorSuite
      1be4e8e [Reynold Xin] Shorten NioManagedBuffer and NettyManagedBuffer class names.
      108c9ed [Reynold Xin] Forgot to add TestSerializer to the commit list.
      b5c8d1f [Reynold Xin] Fixed ShuffleBlockFetcherIteratorSuite.
      064747b [Reynold Xin] Reference count buffers and clean them up properly.
      2b44cf1 [Reynold Xin] Added more documentation.
      1760d32 [Reynold Xin] Use Epoll.isAvailable in BlockServer as well.
      165eab1 [Reynold Xin] [SPARK-3453] Refactor Netty module to use BlockTransferService.
      dff01553
Loading