Commits · 25bbd3c16e8e8be4d2c43000223d54650e9a3696 · cs525-sp18-g07 / spark

Nov 23, 2015

[SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv. · c2467dad

Marcelo Vanzin authored 9 years ago

This change abstracts the code that serves jars / files to executors so that
each RpcEnv can have its own implementation; the akka version uses the existing
HTTP-based file serving mechanism, while the netty versions uses the new
stream support added to the network lib, which makes file transfers benefit
from the easier security configuration of the network library, and should also
reduce overhead overall.

The change includes a small fix to TransportChannelHandler so that it propagates
user events to downstream handlers.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9530 from vanzin/SPARK-11140.

c2467dad

[SPARK-11865][NETWORK] Avoid returning inactive client in TransportClientFactory. · 7cfa4c6b

Marcelo Vanzin authored 9 years ago

There's a very narrow race here where it would be possible for the timeout handler
to close a channel after the client factory verified that the channel was still
active. This change makes sure the client is marked as being recently in use so
that the timeout handler does not close it until a new timeout cycle elapses.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9853 from vanzin/SPARK-11865.

7cfa4c6b

[SPARK-11762][NETWORK] Account for active streams when couting outstanding requests. · 5231cd5a

Marcelo Vanzin authored 9 years ago

This way the timeout handling code can correctly close "hung" channels that are
processing streams.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9747 from vanzin/SPARK-11762.

5231cd5a

Nov 19, 2015

[SPARK-11830][CORE] Make NettyRpcEnv bind to the specified host · 72d150c2

zsxwing authored 9 years ago

This PR includes the following change:

1. Bind NettyRpcEnv to the specified host
2. Fix the port information in the log for NettyRpcEnv.
3. Fix the service name of NettyRpcEnv.

Author: zsxwing <zsxwing@gmail.com>
Author: Shixiong Zhu <shixiong@databricks.com>

Closes #9821 from zsxwing/SPARK-11830.

72d150c2

Nov 18, 2015

[SPARK-11495] Fix potential socket / file handle leaks that were found via static analysis · 4b117121

Josh Rosen authored 9 years ago

The HP Fortify Opens Source Review team (https://www.hpfod.com/open-source-review-project) reported a handful of potential resource leaks that were discovered using their static analysis tool. We should fix the issues identified by their scan.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9455 from JoshRosen/fix-potential-resource-leaks.

4b117121

[SPARK-10745][CORE] Separate configs between shuffle and RPC · 7c5b6418

Shixiong Zhu authored 9 years ago

[SPARK-6028](https://issues.apache.org/jira/browse/SPARK-6028) uses network module to implement RPC. However, there are some configurations named with `spark.shuffle` prefix in the network module.

This PR refactors them to make sure the user can control them in shuffle and RPC separately. The user can use `spark.rpc.*` to set the configuration for netty RPC.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #9481 from zsxwing/SPARK-10745.

7c5b6418

Nov 16, 2015

[SPARK-11617][NETWORK] Fix leak in TransportFrameDecoder. · 540bf58f

Marcelo Vanzin authored 9 years ago

The code was using the wrong API to add data to the internal composite
buffer, causing buffers to leak in certain situations. Use the right
API and enhance the tests to catch memory leaks.

Also, avoid reusing the composite buffers when downstream handlers keep
references to them; this seems to cause a few different issues even though
the ref counting code seems to be correct, so instead pay the cost of copying
a few bytes when that situation happens.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9619 from vanzin/SPARK-11617.

540bf58f

Nov 10, 2015

[SPARK-11252][NETWORK] ShuffleClient should release connection after fetching... · 6e5fc378

Lianhui Wang authored 9 years ago

[SPARK-11252][NETWORK] ShuffleClient should release connection after fetching blocks had been completed for external shuffle

with yarn's external shuffle, ExternalShuffleClient of executors reserve its connections for yarn's NodeManager until application has been completed. so it will make NodeManager and executors have many socket connections.
in order to reduce network pressure of NodeManager's shuffleService, after registerWithShuffleServer or fetchBlocks have been completed in ExternalShuffleClient, connection for NM's shuffleService needs to be closed.andrewor14 rxin vanzin

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #9227 from lianhuiwang/spark-11252.

6e5fc378

Nov 04, 2015

[SPARK-11235][NETWORK] Add ability to stream data using network lib. · 27feafcc

Marcelo Vanzin authored 9 years ago

The current interface used to fetch shuffle data is not very efficient for
large buffers; it requires the receiver to buffer the entirety of the
contents being downloaded in memory before processing the data.

To use the network library to transfer large files (such as those that
can be added using SparkContext addJar / addFile), this change adds a
more efficient way of downloding data, by streaming the data and feeding
it to a callback as data arrives.

This is achieved by a custom frame decoder that replaces the current netty
one; this decoder allows entering a mode where framing is skipped and data
is instead provided directly to a callback. The existing netty classes
(ByteToMessageDecoder and LengthFieldBasedFrameDecoder) could not be reused
since their semantics do not allow for the interception approach the new
decoder uses.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9206 from vanzin/SPARK-11235.

27feafcc

Nov 02, 2015

[SPARK-10997][CORE] Add "client mode" to netty rpc env. · 71d1c907

Marcelo Vanzin authored 9 years ago

"Client mode" means the RPC env will not listen for incoming connections.
This allows certain processes in the Spark stack (such as Executors or
tha YARN client-mode AM) to act as pure clients when using the netty-based
RPC backend, reducing the number of sockets needed by the app and also the
number of open ports.

Client connections are also preferred when endpoints that actually have
a listening socket are involved; so, for example, if a Worker connects
to a Master and the Master needs to send a message to a Worker endpoint,
that client connection will be used, even though the Worker is also
listening for incoming connections.

With this change, the workaround for SPARK-10987 isn't necessary anymore, and
is removed. The AM connects to the driver in "client mode", and that connection
is used for all driver <-> AM communication, and so the AM is properly notified
when the connection goes down.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9210 from vanzin/SPARK-10997.

71d1c907

Oct 14, 2015
- [SPARK-11040] [NETWORK] Make sure SASL handler delegates all events. · 31f31598
  Marcelo Vanzin authored 9 years ago
  
  Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9053 from vanzin/SPARK-11040.
  31f31598
Oct 07, 2015
- [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. · 94fc57af
  Marcelo Vanzin authored 9 years ago
  
  Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8775 from vanzin/SPARK-10300.
  94fc57af
Oct 03, 2015

[SPARK-6028] [CORE] Remerge #6457: new RPC implemetation and also pick #8905 · 107320c9

zsxwing authored 9 years ago

This PR just reverted https://github.com/apache/spark/commit/02144d6745ec0a6d8877d969feb82139bd22437f to remerge #6457 and also included the commits in #8905.

Author: zsxwing <zsxwing@gmail.com>

Closes #8944 from zsxwing/SPARK-6028.

107320c9

Sep 28, 2015

[SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE · bf4199e2

Sean Owen authored 9 years ago

In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree.

The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that.

Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way.

The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing.

Author: Sean Owen <sowen@cloudera.com>

Closes #8919 from srowen/SPARK-10833.

bf4199e2

Sep 24, 2015
- Revert "[SPARK-6028][Core]A new RPC implemetation based on the network module" · 02144d67
  Xiangrui Meng authored 9 years ago
  
  This reverts commit 084e4e12.
  02144d67
Sep 23, 2015

[SPARK-6028][Core]A new RPC implemetation based on the network module · 084e4e12

zsxwing authored 9 years ago

Design doc: https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing

Author: zsxwing <zsxwing@gmail.com>

Closes #6457 from zsxwing/new-rpc.

084e4e12

[SPARK-10721] Log warning when file deletion fails · 27bfa9ab
tedyu authored 9 years ago
```
Author: tedyu <yuzhihong@gmail.com>

Closes #8843 from tedyu/master.
```
27bfa9ab

Sep 18, 2015
- [SPARK-9808] Remove hash shuffle file consolidation. · 348d7c9a
  Reynold Xin authored 9 years ago
  
  Author: Reynold Xin <rxin@databricks.com> Closes #8812 from rxin/SPARK-9808-1.
  348d7c9a
Sep 17, 2015

[SPARK-10674] [TESTS] Increase timeouts in SaslIntegrationSuite. · 0f5ef6df

Marcelo Vanzin authored 9 years ago

1s seems to trigger too many times on the jenkins build boxes, so
increase the timeout and cross fingers.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8802 from vanzin/SPARK-10674 and squashes the following commits:

3c93117 [Marcelo Vanzin] Use java 7 syntax.
d667d1b [Marcelo Vanzin] [SPARK-10674] [tests] Increase timeouts in SaslIntegrationSuite.

0f5ef6df

Sep 15, 2015

Revert "[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py." · b42059d2
Marcelo Vanzin authored 9 years ago
```
This reverts commit 8abef21d.
```
b42059d2

[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. · 8abef21d

Marcelo Vanzin authored 9 years ago

This change does two things:

- tag a few tests and adds the mechanism in the build to be able to disable those tags,
  both in maven and sbt, for both junit and scalatest suites.
- add some logic to run-tests.py to disable some tags depending on what files have
  changed; that's used to disable expensive tests when a module hasn't explicitly
  been changed, to speed up testing for changes that don't directly affect those
  modules.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8437 from vanzin/test-tags.

8abef21d

Update version to 1.6.0-SNAPSHOT. · 09b7e7c1
Reynold Xin authored 9 years ago
```
Author: Reynold Xin <rxin@databricks.com>

Closes #8350 from rxin/1.6.
```
09b7e7c1

Sep 02, 2015

[SPARK-10004] [SHUFFLE] Perform auth checks when clients read shuffle data. · 2da3a9e9

Marcelo Vanzin authored 9 years ago

To correctly isolate applications, when requests to read shuffle data
arrive at the shuffle service, proper authorization checks need to
be performed. This change makes sure that only the application that
created the shuffle data can read from it.

Such checks are only enabled when "spark.authenticate" is enabled,
otherwise there's no secure way to make sure that the client is really
who it says it is.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8218 from vanzin/SPARK-10004.

2da3a9e9

Aug 28, 2015

typo in comment · 71a077f6

Dharmesh Kakadia authored 9 years ago

Author: Dharmesh Kakadia <dharmeshkakadia@users.noreply.github.com>

Closes #8497 from dharmeshkakadia/patch-2.

71a077f6

Aug 21, 2015

[SPARK-9439] [YARN] External shuffle service robust to NM restarts using leveldb · 708036c1

Imran Rashid authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-9439

In general, Yarn apps should be robust to NodeManager restarts. However, if you run spark with the external shuffle service on, after a NM restart all shuffles fail, b/c the shuffle service has lost some state with info on each executor. (Note the shuffle data is perfectly fine on disk across a NM restart, the problem is we've lost the small bit of state that lets us *find* those files.)

The solution proposed here is that the external shuffle service can write out its state to leveldb (backed by a local file) every time an executor is added. When running with yarn, that file is in the NM's local dir. Whenever the service is started, it looks for that file, and if it exists, it reads the file and re-registers all executors there.

Nothing is changed in non-yarn modes with this patch. The service is not given a place to save the state to, so it operates the same as before. This should make it easy to update other cluster managers as well, by just supplying the right file & the equivalent of yarn's `initializeApplication` -- I'm not familiar enough with those modes to know how to do that.

Author: Imran Rashid <irashid@cloudera.com>

Closes #7943 from squito/leveldb_external_shuffle_service_NM_restart and squashes the following commits:

0d285d3 [Imran Rashid] review feedback
70951d6 [Imran Rashid] Merge branch 'master' into leveldb_external_shuffle_service_NM_restart
5c71c8c [Imran Rashid] save executor to db before registering; style
2499c8c [Imran Rashid] explicit dependency on jackson-annotations
795d28f [Imran Rashid] review feedback
81f80e2 [Imran Rashid] Merge branch 'master' into leveldb_external_shuffle_service_NM_restart
594d520 [Imran Rashid] use json to serialize application executor info
1a7980b [Imran Rashid] version
8267d2a [Imran Rashid] style
e9f99e8 [Imran Rashid] cleanup the handling of bad dbs a little
9378ba3 [Imran Rashid] fail gracefully on corrupt leveldb files
acedb62 [Imran Rashid] switch to writing out one record per executor
79922b7 [Imran Rashid] rely on yarn to call stopApplication; assorted cleanup
12b6a35 [Imran Rashid] save registered executors when apps are removed; add tests
c878fbe [Imran Rashid] better explanation of shuffle service port handling
694934c [Imran Rashid] only open leveldb connection once per service
d596410 [Imran Rashid] store executor data in leveldb
59800b7 [Imran Rashid] Files.move in case renaming is unsupported
32fe5ae [Imran Rashid] Merge branch 'master' into external_shuffle_service_NM_restart
d7450f0 [Imran Rashid] style
f729e2b [Imran Rashid] debugging
4492835 [Imran Rashid] lol, dont use a PrintWriter b/c of scalastyle checks
0a39b98 [Imran Rashid] Merge branch 'master' into external_shuffle_service_NM_restart
55f49fc [Imran Rashid] make sure the service doesnt die if the registered executor file is corrupt; add tests
245db19 [Imran Rashid] style
62586a6 [Imran Rashid] just serialize the whole executors map
bdbbf0d [Imran Rashid] comments, remove some unnecessary changes
857331a [Imran Rashid] better tests & comments
bb9d1e6 [Imran Rashid] formatting
bdc4b32 [Imran Rashid] rename
86e0cb9 [Imran Rashid] for tests, shuffle service finds an open port
23994ff [Imran Rashid] style
7504de8 [Imran Rashid] style
a36729c [Imran Rashid] cleanup
efb6195 [Imran Rashid] proper unit test, and no longer leak if apps stop during NM restart
dd93dc0 [Imran Rashid] test for shuffle service w/ NM restarts
d596969 [Imran Rashid] cleanup imports
0e9d69b [Imran Rashid] better names
9eae119 [Imran Rashid] cleanup lots of duplication
1136f44 [Imran Rashid] test needs to have an actual shuffle
0b588bd [Imran Rashid] more fixes ...
ad122ef [Imran Rashid] more fixes
5e5a7c3 [Imran Rashid] fix build
c69f46b [Imran Rashid] maybe working version, needs tests & cleanup ...
bb3ba49 [Imran Rashid] minor cleanup
36127d3 [Imran Rashid] wip
b9d2ced [Imran Rashid] incomplete setup for external shuffle service tests

708036c1

Aug 11, 2015

[SPARK-7726] Add import so Scaladoc doesn't fail. · 2a3be4dd

Patrick Wendell authored 9 years ago

This is another import needed so Scala 2.11 doc generation doesn't fail.
See SPARK-7726 for more detail. I tested this locally and the 2.11
install goes from failing to succeeding with this patch.

Author: Patrick Wendell <patrick@databricks.com>

Closes #8095 from pwendell/scaladoc.

2a3be4dd

Aug 04, 2015

[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build... · 76d74090

Sean Owen authored 9 years ago

[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.

I'll explain several of the changes inline in comments.

Author: Sean Owen <sowen@cloudera.com>

Closes #7862 from srowen/SPARK-9534 and squashes the following commits:

ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.

76d74090

Aug 03, 2015

[SPARK-8873] [MESOS] Clean up shuffle files if external shuffle service is used · 95dccc63

Timothy Chen authored 9 years ago

This patch builds directly on #7820, which is largely written by tnachen. The only addition is one commit for cleaning up the code. There should be no functional differences between this and #7820.

Author: Timothy Chen <tnachen@gmail.com>
Author: Andrew Or <andrew@databricks.com>

Closes #7881 from andrewor14/tim-cleanup-mesos-shuffle and squashes the following commits:

8894f7d [Andrew Or] Clean up code
2a5fa10 [Andrew Or] Merge branch 'mesos_shuffle_clean' of github.com:tnachen/spark into tim-cleanup-mesos-shuffle
fadff89 [Timothy Chen] Address comments.
e4d0f1d [Timothy Chen] Clean up external shuffle data on driver exit with Mesos.

95dccc63

Jul 02, 2015

[SPARK-3071] Increase default driver memory · 3697232b

Ilya Ganelin authored 9 years ago

I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.

Please let me know if I've missed anything.

Will the spark-shell use the value within the command line builder during instantiation?

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #7132 from ilganeli/SPARK-3071 and squashes the following commits:

4074164 [Ilya Ganelin] String fix
271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
273b6e9 [Ilya Ganelin] Test fix
fd67721 [Ilya Ganelin] Update JavaUtils.java
26cc177 [Ilya Ganelin] test fix
e5db35d [Ilya Ganelin] Fixed test failure
39732a1 [Ilya Ganelin] merge fix
a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
19b6f25 [Ilya Ganelin] Missed one doc update
2698a3d [Ilya Ganelin] Updated default value for driver memory

3697232b

Jun 28, 2015

[SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451

Josh Rosen authored 9 years ago

Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.

See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:

70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.

f5100451

Jun 19, 2015

[SPARK-8430] ExternalShuffleBlockResolver of shuffle service should support UnsafeShuffleManager · 9baf0930

Lianhui Wang authored 9 years ago

andrewor14 can you take a look?thanks

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #6873 from lianhuiwang/SPARK-8430 and squashes the following commits:

51c47ca [Lianhui Wang] update andrewor's comments
2b27b19 [Lianhui Wang] support UnsafeShuffleManager

9baf0930

Jun 03, 2015

[SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e

Patrick Wendell authored 9 years ago

Author: Patrick Wendell <patrick@databricks.com>

Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:

2f42d02 [Patrick Wendell] A few more excludes
4bebcf0 [Patrick Wendell] Update to RC4
61aaf46 [Patrick Wendell] Using new release candidate
55f1610 [Patrick Wendell] Another exclude
04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0

2c4d550e

May 19, 2015

[SPARK-7726] Fix Scaladoc false errors · 3c4c1f96

Iulian Dragos authored 9 years ago

Visibility rules for static members are different in Scala and Java, and this case requires an explicit static import. Even though these are Java files, they are run through scaladoc, which enforces Scala rules.

Also reverted the commit that reverts the upgrade to 2.11.6

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #6260 from dragos/issue/scaladoc-false-error and squashes the following commits:

f2e998e [Iulian Dragos] Revert "[HOTFIX] Revert "[SPARK-7092] Update spark scala version to 2.11.6""
0bad052 [Iulian Dragos] Fix scaladoc faux-error.

3c4c1f96

May 08, 2015

[SPARK-6955] Perform port retries at NettyBlockTransferService level · ffdc40ce

Aaron Davidson authored 9 years ago

Currently we're doing port retries in the TransportServer level, but this is not specified by the TransportContext API and it has other further-reaching impacts like causing undesirable behavior for the Yarn and Standalone shuffle services.

Author: Aaron Davidson <aaron@databricks.com>

Closes #5575 from aarondav/port-bind and squashes the following commits:

3c2d6ed [Aaron Davidson] Oops, never do it.
a5d9432 [Aaron Davidson] Remove shouldHostShuffleServiceIfEnabled
e901eb2 [Aaron Davidson] fix local-cluster mode for ExternalShuffleServiceSuite
59e5e38 [Aaron Davidson] [SPARK-6955] Perform port retries at NettyBlockTransferService level

ffdc40ce

[SPARK-6627] Finished rename to ShuffleBlockResolver · 4b3bb0e4

Kay Ousterhout authored 9 years ago

The previous cleanup-commit for SPARK-6627 renamed ShuffleBlockManager
to ShuffleBlockResolver, but didn't rename the associated subclasses and
variables; this commit does that.

I'm unsure whether it's ok to rename ExternalShuffleBlockManager, since that's technically a public class?

cc pwendell

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #5764 from kayousterhout/SPARK-6627 and squashes the following commits:

43add1e [Kay Ousterhout] Spacing fix
96080bf [Kay Ousterhout] Test fixes
d8a5d36 [Kay Ousterhout] [SPARK-6627] Finished rename to ShuffleBlockResolver

4b3bb0e4

May 01, 2015

[SPARK-6229] Add SASL encryption to network library. · 38d4e9e4

Marcelo Vanzin authored 9 years ago

There are two main parts of this change:

- Extending the bootstrap mechanism in the network library to add a server-side
  bootstrap (which works a little bit differently than the client-side bootstrap), and
  to allow the  bootstraps to modify the underlying channel.

- Use SASL to encrypt data going through the RPC channel.

The second item requires some non-optimal code to be able to work around the
fact that the outbound path in netty is not thread-safe, and ordering is very important
when encryption is in the picture.

A lot of the changes outside the network/common library are just to adjust to the
changed API for initializing the RPC server.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5377 from vanzin/SPARK-6229 and squashes the following commits:

ff01966 [Marcelo Vanzin] Use fancy new size config style.
be53f32 [Marcelo Vanzin] Merge branch 'master' into SPARK-6229
47d4aff [Marcelo Vanzin] Merge branch 'master' into SPARK-6229
7a2a805 [Marcelo Vanzin] Clean up some unneeded changes.
2f92237 [Marcelo Vanzin] Add comment.
67bb0c6 [Marcelo Vanzin] Revert "Avoid exposing ByteArrayWritableChannel outside of test code."
065f684 [Marcelo Vanzin] Add test to verify chunking.
3d1695d [Marcelo Vanzin] Minor cleanups.
73cff0e [Marcelo Vanzin] Skip bytes in decode path too.
318ad23 [Marcelo Vanzin] Avoid exposing ByteArrayWritableChannel outside of test code.
346f829 [Marcelo Vanzin] Avoid trip through channel selector by not reporting 0 bytes written.
a4a5938 [Marcelo Vanzin] Review feedback.
4797519 [Marcelo Vanzin] Remove unused import.
9908ada [Marcelo Vanzin] Fix test, SASL backend disposal.
7fe1489 [Marcelo Vanzin] Add a test that makes sure encryption is actually enabled.
adb6f9d [Marcelo Vanzin] Review feedback.
cf2a605 [Marcelo Vanzin] Clean up some code.
8584323 [Marcelo Vanzin] Fix a comment.
e98bc55 [Marcelo Vanzin] Add option to only allow encrypted connections to the server.
dad42fc [Marcelo Vanzin] Make encryption thread-safe, less memory-intensive.
b00999a [Marcelo Vanzin] Consolidate ByteArrayWritableChannel, fix SASL code to match master changes.
b923cae [Marcelo Vanzin] Make SASL encryption handler thread-safe, handle FileRegion messages.
39539a7 [Marcelo Vanzin] Add config option to enable SASL encryption.
351a86f [Marcelo Vanzin] Add SASL encryption to network library.
fbe6ccb [Marcelo Vanzin] Add TransportServerBootstrap, make SASL code use it.

38d4e9e4

[SPARK-7183] [NETWORK] Fix memory leak of TransportRequestHandler.streamIds · 16860327

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-7183

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #5743 from viirya/fix_requesthandler_memory_leak and squashes the following commits:

cf2c086 [Liang-Chi Hsieh] For comments.
97e205c [Liang-Chi Hsieh] Remove unused import.
d35f19a [Liang-Chi Hsieh] For comments.
f9a0c37 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_requesthandler_memory_leak
45908b7 [Liang-Chi Hsieh] for style.
17f020f [Liang-Chi Hsieh] Remove unused import.
37a4b6c [Liang-Chi Hsieh] Remove streamIds from TransportRequestHandler.
3b3f38a [Liang-Chi Hsieh] Fix memory leak of TransportRequestHandler.streamIds.

16860327

Apr 28, 2015

[SPARK-5932] [CORE] Use consistent naming for size properties · 2d222fb3

Ilya Ganelin authored 9 years ago

I've added an interface to JavaUtils to do byte conversion and added hooks within Utils.scala to handle conversion within Spark code (like for time strings). I've added matching tests for size conversion, and then updated all deprecated configs and documentation as per SPARK-5933.

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #5574 from ilganeli/SPARK-5932 and squashes the following commits:

11f6999 [Ilya Ganelin] Nit fixes
49a8720 [Ilya Ganelin] Whitespace fix
2ab886b [Ilya Ganelin] Scala style
fc85733 [Ilya Ganelin] Got rid of floating point math
852a407 [Ilya Ganelin] [SPARK-5932] Added much improved overflow handling. Can now handle sizes up to Long.MAX_VALUE Petabytes instead of being capped at Long.MAX_VALUE Bytes
9ee779c [Ilya Ganelin] Simplified fraction matches
22413b1 [Ilya Ganelin] Made MAX private
3dfae96 [Ilya Ganelin] Fixed some nits. Added automatic conversion of old paramter for kryoserializer.mb to new values.
e428049 [Ilya Ganelin] resolving merge conflict
8b43748 [Ilya Ganelin] Fixed error in pattern matching for doubles
84a2581 [Ilya Ganelin] Added smoother handling of fractional values for size parameters. This now throws an exception and added a warning for old spark.kryoserializer.buffer
d3d09b6 [Ilya Ganelin] [SPARK-5932] Fixing error in KryoSerializer
fe286b4 [Ilya Ganelin] Resolved merge conflict
c7803cd [Ilya Ganelin] Empty lines
54b78b4 [Ilya Ganelin] Simplified byteUnit class
69e2f20 [Ilya Ganelin] Updates to code
f32bc01 [Ilya Ganelin] [SPARK-5932] Fixed error in API in SparkConf.scala where Kb conversion wasn't being done properly (was Mb). Added test cases for both timeUnit and ByteUnit conversion
f15f209 [Ilya Ganelin] Fixed conversion of kryo buffer size
0f4443e [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5932
35a7fa7 [Ilya Ganelin] Minor formatting
928469e [Ilya Ganelin] [SPARK-5932] Converted some longs to ints
5d29f90 [Ilya Ganelin] [SPARK-5932] Finished documentation updates
7a6c847 [Ilya Ganelin] [SPARK-5932] Updated spark.shuffle.file.buffer
afc9a38 [Ilya Ganelin] [SPARK-5932] Updated spark.broadcast.blockSize and spark.storage.memoryMapThreshold
ae7e9f6 [Ilya Ganelin] [SPARK-5932] Updated spark.io.compression.snappy.block.size
2d15681 [Ilya Ganelin] [SPARK-5932] Updated spark.executor.logs.rolling.size.maxBytes
1fbd435 [Ilya Ganelin] [SPARK-5932] Updated spark.broadcast.blockSize
eba4de6 [Ilya Ganelin] [SPARK-5932] Updated spark.shuffle.file.buffer.kb
b809a78 [Ilya Ganelin] [SPARK-5932] Updated spark.kryoserializer.buffer.max
0cdff35 [Ilya Ganelin] [SPARK-5932] Updated to use bibibytes in method names. Updated spark.kryoserializer.buffer.mb and spark.reducer.maxMbInFlight
475370a [Ilya Ganelin] [SPARK-5932] Simplified ByteUnit code, switched to using longs. Updated docs to clarify that we use kibi, mebi etc instead of kilo, mega
851d691 [Ilya Ganelin] [SPARK-5932] Updated memoryStringToMb to use new interfaces
a9f4fcf [Ilya Ganelin] [SPARK-5932] Added unit tests for unit conversion
747393a [Ilya Ganelin] [SPARK-5932] Added unit tests for ByteString conversion
09ea450 [Ilya Ganelin] [SPARK-5932] Added byte string conversion to Jav utils
5390fd9 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5932
db9a963 [Ilya Ganelin] Closing second spark context
1dc0444 [Ilya Ganelin] Added ref equality check
8c884fa [Ilya Ganelin] Made getOrCreate synchronized
cb0c6b7 [Ilya Ganelin] Doc updates and code cleanup
270cfe3 [Ilya Ganelin] [SPARK-6703] Documentation fixes
15e8dea [Ilya Ganelin] Updated comments and added MiMa Exclude
0e1567c [Ilya Ganelin] Got rid of unecessary option for AtomicReference
dfec4da [Ilya Ganelin] Changed activeContext to AtomicReference
733ec9f [Ilya Ganelin] Fixed some bugs in test code
8be2f83 [Ilya Ganelin] Replaced match with if
e92caf7 [Ilya Ganelin] [SPARK-6703] Added test to ensure that getOrCreate both allows creation, retrieval, and a second context if desired
a99032f [Ilya Ganelin] Spacing fix
d7a06b8 [Ilya Ganelin] Updated SparkConf class to add getOrCreate method. Started test suite implementation

2d222fb3

[SPARK-7168] [BUILD] Update plugin versions in Maven build and centralize versions · 7f3b3b7e

Sean Owen authored 9 years ago

Update Maven build plugin versions and centralize plugin version management

Author: Sean Owen <sowen@cloudera.com>

Closes #5720 from srowen/SPARK-7168 and squashes the following commits:

98a8947 [Sean Owen] Make install, deploy plugin versions explicit
4ecf3b2 [Sean Owen] Update Maven build plugin versions and centralize plugin version management

7f3b3b7e

Apr 20, 2015

[SPARK-7003] Improve reliability of connection failure detection between Netty... · 968ad972

Aaron Davidson authored 9 years ago

[SPARK-7003] Improve reliability of connection failure detection between Netty block transfer service endpoints

Currently we rely on the assumption that an exception will be raised and the channel closed if two endpoints cannot communicate over a Netty TCP channel. However, this guarantee does not hold in all network environments, and [SPARK-6962](https://issues.apache.org/jira/browse/SPARK-6962) seems to point to a case where only the server side of the connection detected a fault.

This patch improves robustness of fetch/rpc requests by having an explicit timeout in the transport layer which closes the connection if there is a period of inactivity while there are outstanding requests.

NB: This patch is actually only around 50 lines added if you exclude the testing-related code.

Author: Aaron Davidson <aaron@databricks.com>

Closes #5584 from aarondav/timeout and squashes the following commits:

8699680 [Aaron Davidson] Address Reynold's comments
37ce656 [Aaron Davidson] [SPARK-7003] Improve reliability of connection failure detection between Netty block transfer service endpoints

968ad972