Commits · 8e097890a7b39cd8320ed2f2b98dc2b520a87cbf · cs525-sp18-g07 / spark

May 02, 2017

[SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkContext when stopping it · d10b0f65

Wenchen Fan authored 7 years ago

## What changes were proposed in this pull request?

To better understand this problem, let's take a look at an example first:
```
object Main {
  def main(args: Array[String]): Unit = {
    var t = new Test
    new Thread(new Runnable {
      override def run() = {}
    }).start()
    println("first thread finished")

    t.a = null
    t = new Test
    new Thread(new Runnable {
      override def run() = {}
    }).start()
  }

}

class Test {
  var a = new InheritableThreadLocal[String] {
    override protected def childValue(parent: String): String = {
      println("parent value is: " + parent)
      parent
    }
  }
  a.set("hello")
}
```
The result is:
```
parent value is: hello
first thread finished
parent value is: hello
parent value is: hello
```

Once an `InheritableThreadLocal` has been set value, child threads will inherit its value as long as it has not been GCed, so setting the variable which holds the `InheritableThreadLocal` to `null` doesn't work as we expected.

In `SparkContext`, we have an `InheritableThreadLocal` for local properties, we should clear it when stopping `SparkContext`, or all the future child threads will still inherit it and copy the properties and waste memory.

This is the root cause of https://issues.apache.org/jira/browse/SPARK-20548

 , which creates/stops `SparkContext` many times and finally have a lot of `InheritableThreadLocal` alive, and cause OOM when starting new threads in the internal thread pools.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #17833 from cloud-fan/core.

(cherry picked from commit b946f316)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

d10b0f65

May 01, 2017

[SPARK-20540][CORE] Fix unstable executor requests. · 5915588a

Ryan Blue authored 7 years ago


There are two problems fixed in this commit. First, the
ExecutorAllocationManager sets a timeout to avoid requesting executors
too often. However, the timeout is always updated based on its value and
a timeout, not the current time. If the call is delayed by locking for
more than the ongoing scheduler timeout, the manager will request more
executors on every run. This seems to be the main cause of SPARK-20540.

The second problem is that the total number of requested executors is
not tracked by the CoarseGrainedSchedulerBackend. Instead, it calculates
the value based on the current status of 3 variables: the number of
known executors, the number of executors that have been killed, and the
number of pending executors. But, the number of pending executors is
never less than 0, even though there may be more known than requested.
When executors are killed and not replaced, this can cause the request
sent to YARN to be incorrect because there were too many executors due
to the scheduler's state being slightly out of date. This is fixed by tracking
the currently requested size explicitly.

## How was this patch tested?

Existing tests.

Author: Ryan Blue <blue@apache.org>

Closes #17813 from rdblue/SPARK-20540-fix-dynamic-allocation.

(cherry picked from commit 2b2dd08e)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

5915588a

[SPARK-20517][UI] Fix broken history UI download link · 868b4a1a

jerryshao authored 7 years ago


The download link in history server UI is concatenated with:

```
 <td><a href="{{uiroot}}/api/v1/applications/{{id}}/{{num}}/logs" class="btn btn-info btn-mini">Download</a></td>
```

Here `num` field represents number of attempts, this is not equal to REST APIs. In the REST API, if attempt id is not existed the URL should be `api/v1/applications/<id>/logs`, otherwise the URL should be `api/v1/applications/<id>/<attemptId>/logs`. Using `<num>` to represent `<attemptId>` will lead to the issue of "no such app".

Manual verification.

CC ajbozarth can you please review this change, since you add this feature before? Thanks!

Author: jerryshao <sshao@hortonworks.com>

Closes #17795 from jerryshao/SPARK-20517.

(cherry picked from commit ab30590f)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

868b4a1a

Apr 25, 2017

Preparing development version 2.1.2-SNAPSHOT · 8460b090
Patrick Wendell authored 7 years ago

8460b090
Preparing Spark release v2.1.1-rc4 · 267aca5b
Patrick Wendell authored 7 years ago

View commits for tag v2.1.1 v2.1.1

267aca5b

[SPARK-20239][CORE][2.1-BACKPORT] Improve HistoryServer's ACL mechanism · 359382c0

jerryshao authored 7 years ago

Current SHS (Spark History Server) has two different ACLs:

* ACL of base URL, it is controlled by "spark.acls.enabled" or "spark.ui.acls.enabled", and with this enabled, only user configured with "spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user who started SHS could list all the applications, otherwise none of them can be listed. This will also affect REST APIs which listing the summary of all apps and one app.
* Per application ACL. This is controlled by "spark.history.ui.acls.enabled". With this enabled only history admin user and user/group who ran this app can access the details of this app.

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no view permission. User "A" cannot see the app list but could still access details of it's own app.
2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" could download any application's event log, even it is not run by user "A".
3. The changes of Live UI's ACL will affect History UI's ACL which share the same conf file.

The unexpected behaviors is mainly because we have two different ACLs, ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, here in this PR proposed to:

1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" for history server.
2. Check permission for event-log download REST API.

With this PR:

1. Admin user could see/download the list of all applications, as well as application details.
2. Normal user could see the list of all applications, but can only download and check the details of applications accessible to him.

New UTs are added, also verified in real cluster.

CC tgravescs vanzin please help to review, this PR changes the semantics you did previously. Thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #17755 from jerryshao/SPARK-20239-2.1-backport.

359382c0

[SPARK-20404][CORE] Using Option(name) instead of Some(name) · 2d47e1aa

Sergey Zhemzhitsky authored 7 years ago


Using Option(name) instead of Some(name) to prevent runtime failures when using accumulators created like the following
```
sparkContext.accumulator(0, null)
```

Author: Sergey Zhemzhitsky <szhemzhitski@gmail.com>

Closes #17740 from szhem/SPARK-20404-null-acc-names.

(cherry picked from commit 0bc7a902)
Signed-off-by: Sean Owen <sowen@cloudera.com>

2d47e1aa

Apr 14, 2017
- Preparing development version 2.1.2-SNAPSHOT · 2a3e50e2
  Patrick Wendell authored 7 years ago
  
  2a3e50e2
- Preparing Spark release v2.1.1-rc3 · 2ed19cff
  Patrick Wendell authored 7 years ago
  
  2ed19cff
- [SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams thread race · 6f715c01
  Bogdan Raducanu authored 7 years ago
  
  ## What changes were proposed in this pull request? Synchronize access to openStreams map. ## How was this patch tested? Existing tests. Author: Bogdan Raducanu <bogdan@databricks.com> Closes #17592 from bogdanrdc/SPARK-20243.
  6f715c01
Apr 13, 2017

[SPARK-19946][TESTS][BACKPORT-2.1] DebugFilesystem.assertNoOpenStreams should... · bca7ce28

Bogdan Raducanu authored 7 years ago

[SPARK-19946][TESTS][BACKPORT-2.1] DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging

## What changes were proposed in this pull request?
Backport for PR #17292
DebugFilesystem.assertNoOpenStreams throws an exception with a cause exception that actually shows the code line which leaked the stream.

## How was this patch tested?
New test in SparkContextSuite to check there is a cause exception.

Author: Bogdan Raducanu <bogdan@databricks.com>

Closes #17632 from bogdanrdc/SPARK-19946-BRANCH2.1.

bca7ce28

Apr 12, 2017

[SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBackend.stop · be36c2f1

Shixiong Zhu authored 7 years ago


## What changes were proposed in this pull request?

`o.a.s.streaming.StreamingContextSuite.SPARK-18560 Receiver data should be deserialized properly` is flaky is because there is a potential dead-lock in StandaloneSchedulerBackend which causes `await` timeout. Here is the related stack trace:
```
"Thread-31" #211 daemon prio=5 os_prio=31 tid=0x00007fedd4808000 nid=0x16403 waiting on condition [0x00007000239b7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000079b49ca10> (a scala.concurrent.impl.Promise$CompletionLatch)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
	at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:402)
	at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.org$apache$spark$scheduler$cluster$StandaloneSchedulerBackend$$stop(StandaloneSchedulerBackend.scala:213)
	- locked <0x00000007066fca38> (a org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
	at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.stop(StandaloneSchedulerBackend.scala:116)
	- locked <0x00000007066fca38> (a org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:517)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1657)
	at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1921)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1302)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1920)
	at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:708)
	at org.apache.spark.streaming.StreamingContextSuite$$anonfun$43$$anonfun$apply$mcV$sp$66$$anon$3.run(StreamingContextSuite.scala:827)

"dispatcher-event-loop-3" #18 daemon prio=5 os_prio=31 tid=0x00007fedd603a000 nid=0x6203 waiting for monitor entry [0x0000700003be4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:253)
	- waiting to lock <0x00000007066fca38> (a org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:124)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
```

This PR removes `synchronized` and changes `stopping` to AtomicBoolean to ensure idempotent to fix the dead-lock.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #17610 from zsxwing/SPARK-20131.

(cherry picked from commit c5f1cc37)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

be36c2f1

Apr 09, 2017

[SPARK-20260][MLLIB] String interpolation required for error message · 43a7fcad

Vijay Ramesh authored 7 years ago

## What changes were proposed in this pull request?
This error message doesn't get properly formatted because of a missing `s`.  Currently the error looks like:

```
Caused by: java.lang.IllegalArgumentException: requirement failed: indices should be one-based and in ascending order; found current=$current, previous=$previous; line="$line"
```
(note the literal `$current` instead of the interpolated value)

Please review http://spark.apache.org/contributing.html

 before opening a pull request.

Author: Vijay Ramesh <vramesh@demandbase.com>

Closes #17572 from vijaykramesh/master.

(cherry picked from commit 261eaf51)
Signed-off-by: Sean Owen <sowen@cloudera.com>

43a7fcad

Apr 05, 2017

[SPARK-20042][WEB UI] Fix log page buttons for reverse proxy mode · efc72dcc

Oliver Köth authored 7 years ago


with spark.ui.reverseProxy=true, full path URLs like /log will point to
the master web endpoint which is serving the worker UI as reverse proxy.
To access a REST endpoint in the worker in reverse proxy mode , the
leading /proxy/"target"/ part of the base URI must be retained.

Added logic to log-view.js to handle this, similar to executorspage.js

Patch was tested manually

Author: Oliver Köth <okoeth@de.ibm.com>

Closes #17370 from okoethibm/master.

(cherry picked from commit 6f09dc70)
Signed-off-by: Sean Owen <sowen@cloudera.com>

efc72dcc

Mar 31, 2017

[SPARK-20084][CORE] Remove internal.metrics.updatedBlockStatuses from history files. · e3cec18e

Ryan Blue authored 8 years ago


## What changes were proposed in this pull request?

Remove accumulator updates for internal.metrics.updatedBlockStatuses from SparkListenerTaskEnd entries in the history file. These can cause history files to grow to hundreds of GB because the value of the accumulator contains all tracked blocks.

## How was this patch tested?

Current History UI tests cover use of the history file.

Author: Ryan Blue <blue@apache.org>

Closes #17412 from rdblue/SPARK-20084-remove-block-accumulator-info.

(cherry picked from commit c4c03eed)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

e3cec18e

Mar 29, 2017

[SPARK-20059][YARN] Use the correct classloader for HBaseCredentialProvider · 103ff54d

jerryshao authored 8 years ago


## What changes were proposed in this pull request?

Currently we use system classloader to find HBase jars, if it is specified by `--jars`, then it will be failed with ClassNotFound issue. So here changing to use child classloader.

Also putting added jars and main jar into classpath of submitted application in yarn cluster mode, otherwise HBase jars specified with `--jars` will never be honored in cluster mode, and fetching tokens in client side will always be failed.

## How was this patch tested?

Unit test and local verification.

Author: jerryshao <sshao@hortonworks.com>

Closes #17388 from jerryshao/SPARK-20059.

(cherry picked from commit c622a87c)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

103ff54d

Mar 28, 2017
- Preparing development version 2.1.2-SNAPSHOT · 4964dbed
  Patrick Wendell authored 8 years ago
  
  4964dbed
- Preparing Spark release v2.1.1-rc2 · 02b165dc
  Patrick Wendell authored 8 years ago
  
  02b165dc
Mar 22, 2017

[SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it was called on executors. · 277ed375

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
SparkR ```spark.getSparkFiles``` fails when it was called on executors, see details at [SPARK-19925](https://issues.apache.org/jira/browse/SPARK-19925

).

## How was this patch tested?
Add unit tests, and verify this fix at standalone and yarn cluster.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17274 from yanboliang/spark-19925.

(cherry picked from commit 478fbc86)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

277ed375

Mar 21, 2017
- Preparing development version 2.1.2-SNAPSHOT · c4d2b833
  Patrick Wendell authored 8 years ago
  
  c4d2b833
- Preparing Spark release v2.1.1-rc1 · 30abb95c
  Patrick Wendell authored 8 years ago
  
  30abb95c
Mar 20, 2017

[SPARK-17204][CORE] Fix replicated off heap storage · d205d40a

Michael Allman authored 8 years ago

(Jira: https://issues.apache.org/jira/browse/SPARK-17204

)

## What changes were proposed in this pull request?

There are a couple of bugs in the `BlockManager` with respect to support for replicated off-heap storage. First, the locally-stored off-heap byte buffer is disposed of when it is replicated. It should not be. Second, the replica byte buffers are stored as heap byte buffers instead of direct byte buffers even when the storage level memory mode is off-heap. This PR addresses both of these problems.

## How was this patch tested?

`BlockManagerReplicationSuite` was enhanced to fill in the coverage gaps. It now fails if either of the bugs in this PR exist.

Author: Michael Allman <michael@videoamp.com>

Closes #16499 from mallman/spark-17204-replicated_off_heap_storage.

(cherry picked from commit 7fa116f8)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

d205d40a

Mar 19, 2017

[SPARK-18817][SPARKR][SQL] change derby log output to temp dir · b60f6902

Felix Cheung authored 8 years ago


## What changes were proposed in this pull request?

Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log

## How was this patch tested?

Manually, unit tests

With this, these are relocated to under /tmp
```
# ls /tmp/RtmpG2M0cB/
derby.log
```
And they are removed automatically when the R session is ended.

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16330 from felixcheung/rderby.

(cherry picked from commit 422aa67d)
Signed-off-by: Felix Cheung <felixcheung@apache.org>

b60f6902

Mar 02, 2017

[SPARK-19750][UI][BRANCH-2.1] Fix redirect issue from http to https · 3a7591ad

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

If spark ui port (4040) is not set, it will choose port number 0, this will make https port to also choose 0. And in Spark 2.1 code, it will use this https port (0) to do redirect, so when redirect triggered, it will point to a wrong url:

like:

```
/tmp/temp$ wget http://172.27.25.134:55015
--2017-02-23 12:13:54--  http://172.27.25.134:55015/
Connecting to 172.27.25.134:55015... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://172.27.25.134:0/ [following]
--2017-02-23 12:13:54--  https://172.27.25.134:0/
Connecting to 172.27.25.134:0... failed: Can't assign requested address.
Retrying.

--2017-02-23 12:13:55--  (try: 2)  https://172.27.25.134:0/
Connecting to 172.27.25.134:0... failed: Can't assign requested address.
Retrying.

--2017-02-23 12:13:57--  (try: 3)  https://172.27.25.134:0/
Connecting to 172.27.25.134:0... failed: Can't assign requested address.
Retrying.

--2017-02-23 12:14:00--  (try: 4)  https://172.27.25.134:0/
Connecting to 172.27.25.134:0... failed: Can't assign requested address.
Retrying.

```

So instead of using 0 to do redirect, we should pick a bound port instead.

This issue only exists in Spark 2.1-, and can be reproduced in yarn cluster mode.

## How was this patch tested?

Current redirect UT doesn't verify this issue, so extend current UT to do correct verification.

Author: jerryshao <sshao@hortonworks.com>

Closes #17083 from jerryshao/SPARK-19750.

3a7591ad

Feb 26, 2017

[SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle... · 04fbb9e0

Eyal Zituny authored 8 years ago

[SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists

## What changes were proposed in this pull request?

currently if multiple streaming queries listeners exists, when a QueryTerminatedEvent is triggered, only one of the listeners will be invoked while the rest of the listeners will ignore the event.
this is caused since the the streaming queries listeners bus holds a set of running queries ids and when a termination event is triggered, after the first listeners is handling the event, the terminated query id is being removed from the set.
in this PR, the query id will be removed from the set only after all the listeners handles the event

## How was this patch tested?

a test with multiple listeners has been added to StreamingQueryListenerSuite

Author: Eyal Zituny <eyal.zituny@equalum.io>

Closes #16991 from eyalzit/master.

(cherry picked from commit 9f8e3921)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

04fbb9e0

Feb 24, 2017

[SPARK-19707][CORE] Improve the invalid path check for sc.addJar · 6da6a27f

jerryshao authored 8 years ago


## What changes were proposed in this pull request?

Currently in Spark there're two issues when we add jars with invalid path:

* If the jar path is a empty string {--jar ",dummy.jar"}, then Spark will resolve it to the current directory path and add to classpath / file server, which is unwanted. This is happened in our programatic way to submit Spark application. From my understanding Spark should defensively filter out such empty path.
* If the jar path is a invalid path (file doesn't exist), `addJar` doesn't check it and will still add to file server, the exception will be delayed until job running. Actually this local path could be checked beforehand, no need to wait until task running. We have similar check in `addFile`, but lacks similar similar mechanism in `addJar`.

## How was this patch tested?

Add unit test and local manual verification.

Author: jerryshao <sshao@hortonworks.com>

Closes #17038 from jerryshao/SPARK-19707.

(cherry picked from commit b0a8c16f)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

6da6a27f

Feb 22, 2017

[SPARK-19652][UI] Do auth checks for REST API access (branch-2.1). · 21afc453

Marcelo Vanzin authored 8 years ago

The REST API has a security filter that performs auth checks
based on the UI root's security manager. That works fine when
the UI root is the app's UI, but not when it's the history server.

In the SHS case, all users would be allowed to see all applications
through the REST API, even if the UI itself wouldn't be available
to them.

This change adds auth checks for each app access through the API
too, so that only authorized users can see the app's data.

The change also modifies the existing security filter to use
`HttpServletRequest.getRemoteUser()`, which is used in other
places. That is not necessarily the same as the principal's
name; for example, when using Hadoop's SPNEGO auth filter,
the remote user strips the realm information, which then matches
the user name registered as the owner of the application.

I also renamed the UIRootFromServletContext trait to a more generic
name since I'm using it to store more context information now.

Tested manually with an authentication filter enabled.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #17019 from vanzin/SPARK-19652_2.1.

21afc453

Feb 20, 2017

[SPARK-19646][BUILD][HOTFIX] Fix compile error from cherry-pick of SPARK-19646 into branch 2.1 · c3316743

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

Fix compile error from cherry-pick of SPARK-19646 into branch 2.1

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #17003 from srowen/SPARK-19646.2.

Unverified

c3316743

[SPARK-19646][CORE][STREAMING] binaryRecords replicates records in scala API · 7c371dec

Sean Owen authored 8 years ago


## What changes were proposed in this pull request?

Use `BytesWritable.copyBytes`, not `getBytes`, because `getBytes` returns the underlying array, which may be reused when repeated reads don't need a different size, as is the case with binaryRecords APIs

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16974 from srowen/SPARK-19646.

(cherry picked from commit d0ecca60)
Signed-off-by: Sean Owen <sowen@cloudera.com>

Unverified

7c371dec

Feb 17, 2017

[SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap · 6e3abed8

Davies Liu authored 8 years ago


## What changes were proposed in this pull request?

Radix sort require that half of array as free (as temporary space), so we use 0.5 as the scale factor to make sure that BytesToBytesMap will not have more items than 1/2 of capacity. Turned out this is not true, the current implementation of append() could leave 1 more item than the threshold (1/2 of capacity) in the array, which break the requirement of radix sort (fail the assert in 2.2, or fail to insert into InMemorySorter in 2.1).

This PR fix the off-by-one bug in BytesToBytesMap.

This PR also fix a bug that the array will never grow if it fail to grow once (stay as initial capacity), introduced by #15722 .

## How was this patch tested?

Added regression test.

Author: Davies Liu <davies@databricks.com>

Closes #16844 from davies/off_by_one.

(cherry picked from commit 3d0c3af0)
Signed-off-by: Davies Liu <davies.liu@gmail.com>

6e3abed8

[SPARK-19622][WEBUI] Fix a http error in a paged table when using a `Go` button to search. · 55958bcd

Stan Zhai authored 8 years ago

## What changes were proposed in this pull request?

The search function of paged table is not available because of we don't skip the hash data of the reqeust path.

![](https://issues.apache.org/jira/secure/attachment/12852996/screenshot-1.png

)

## How was this patch tested?

Tested manually with my browser.

Author: Stan Zhai <zhaishidan@haizhi.com>

Closes #16953 from stanzhai/fix-webui-paged-table.

(cherry picked from commit 021062af)
Signed-off-by: Sean Owen <sowen@cloudera.com>

Unverified

55958bcd

Feb 15, 2017

[SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column · 6c353990

Felix Cheung authored 8 years ago


Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16739 from felixcheung/rcoalesce.

(cherry picked from commit 671bc08e)
Signed-off-by: Felix Cheung <felixcheung@apache.org>

6c353990

Feb 13, 2017

[SPARK-19520][STREAMING] Do not encrypt data written to the WAL. · 7fe3543f

Marcelo Vanzin authored 8 years ago


Spark's I/O encryption uses an ephemeral key for each driver instance.
So driver B cannot decrypt data written by driver A since it doesn't
have the correct key.

The write ahead log is used for recovery, thus needs to be readable by
a different driver. So it cannot be encrypted by Spark's I/O encryption
code.

The BlockManager APIs used by the WAL code to write the data automatically
encrypt data, so changes are needed so that callers can to opt out of
encryption.

Aside from that, the "putBytes" API in the BlockManager does not do
encryption, so a separate situation arised where the WAL would write
unencrypted data to the BM and, when those blocks were read, decryption
would fail. So the WAL code needs to ask the BM to encrypt that data
when encryption is enabled; this code is not optimal since it results
in a (temporary) second copy of the data block in memory, but should be
OK for now until a more performant solution is added. The non-encryption
case should not be affected.

Tested with new unit tests, and by running streaming apps that do
recovery using the WAL data with I/O encryption turned on.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16862 from vanzin/SPARK-19520.

(cherry picked from commit 0169360e)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

7fe3543f

[SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using... · 328b2298

Shixiong Zhu authored 8 years ago

[SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorClassLoader to load Netty generated classes

## What changes were proposed in this pull request?

Netty's `MessageToMessageEncoder` uses [Javassist](https://github.com/netty/netty/blob/91a0bdc17a8298437d6de08a8958d753799bd4a6/common/src/main/java/io/netty/util/internal/JavassistTypeParameterMatcherGenerator.java#L62

) to generate a matcher class and the implementation calls `Class.forName` to check if this class is already generated. If `MessageEncoder` or `MessageDecoder` is created in `ExecutorClassLoader.findClass`, it will cause `ClassCircularityError`. This is because loading this Netty generated class will call `ExecutorClassLoader.findClass` to search this class, and `ExecutorClassLoader` will try to use RPC to load it and cause to load the non-exist matcher class again. JVM will report `ClassCircularityError` to prevent such infinite recursion.

##### Why it only happens in Maven builds

It's because Maven and SBT have different class loader tree. The Maven build will set a URLClassLoader as the current context class loader to run the tests and expose this issue. The class loader tree is as following:

```
bootstrap class loader ------ ... ----- REPL class loader ---- ExecutorClassLoader
|
|
URLClasssLoader
```

The SBT build uses the bootstrap class loader directly and `ReplSuite.test("propagation of local properties")` is the first test in ReplSuite, which happens to load `io/netty/util/internal/__matchers__/org/apache/spark/network/protocol/MessageMatcher` into the bootstrap class loader (Note: in maven build, it's loaded into URLClasssLoader so it cannot be found in ExecutorClassLoader). This issue can be reproduced in SBT as well. Here are the produce steps:
- Enable `hadoop.caller.context.enabled`.
- Replace `Class.forName` with `Utils.classForName` in `object CallerContext`.
- Ignore `ReplSuite.test("propagation of local properties")`.
- Run `ReplSuite` using SBT.

This PR just creates a singleton MessageEncoder and MessageDecoder and makes sure they are created before switching to ExecutorClassLoader. TransportContext will be created when creating RpcEnv and that happens before creating ExecutorClassLoader.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16859 from zsxwing/SPARK-17714.

(cherry picked from commit 905fdf0c)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

328b2298

Feb 09, 2017

[SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt · b3fd36a1

Shixiong Zhu authored 8 years ago


## What changes were proposed in this pull request?

`Signaling.cancelOnInterrupt` leaks a SparkContext per call and it makes ReplSuite unstable.

This PR adds `SparkContext.getActive` to allow `Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the leak.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16825 from zsxwing/SPARK-19481.

(cherry picked from commit 303f00a4)
Signed-off-by: Davies Liu <davies.liu@gmail.com>

b3fd36a1

Feb 01, 2017

[SPARK-19432][CORE] Fix an unexpected failure when connecting timeout · 7c23bd49

Shixiong Zhu authored 8 years ago


## What changes were proposed in this pull request?

When connecting timeout, `ask` may fail with a confusing message:

```
17/02/01 23:15:19 INFO Worker: Connecting to master ...
java.lang.IllegalArgumentException: requirement failed: TransportClient has not yet been set.
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.rpc.netty.RpcOutboxMessage.onTimeout(Outbox.scala:70)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$ask$1.applyOrElse(NettyRpcEnv.scala:232)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$ask$1.applyOrElse(NettyRpcEnv.scala:231)
        at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
        at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
```

It's better to provide a meaningful message.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16773 from zsxwing/connect-timeout.

(cherry picked from commit 8303e20c)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

7c23bd49

[SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED · f9464641

Devaraj K authored 8 years ago


## What changes were proposed in this pull request?

Copying of the killed status was missing while getting the newTaskInfo object by dropping the unnecessary details to reduce the memory usage. This patch adds the copying of the killed status to newTaskInfo object, this will correct the display of the status from wrong status to KILLED status in Web UI.

## How was this patch tested?

Current behaviour of displaying tasks in stage UI page,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|143	|10	|0	|SUCCESS	|NODE_LOCAL	|6 / x.xx.x.x stdout stderr|2017/01/25 07:49:27	|0 ms |		|0.0 B / 0		| |0.0 B / 0	|TaskKilled (killed intentionally)|
|156	|11	|0	|SUCCESS	|NODE_LOCAL	|5 / x.xx.x.x stdout stderr|2017/01/25 07:49:27	|0 ms |		|0.0 B / 0		| |0.0 B / 0	|TaskKilled (killed intentionally)|

Web UI display after applying the patch,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|143	|10	|0	|KILLED	|NODE_LOCAL	|6 / x.xx.x.x stdout stderr|2017/01/25 07:49:27	|0 ms |		|0.0 B / 0		|  | 0.0 B / 0	| TaskKilled (killed intentionally)|
|156	|11	|0	|KILLED	|NODE_LOCAL	|5 / x.xx.x.x stdout stderr|2017/01/25 07:49:27	|0 ms |		|0.0 B / 0		|  |0.0 B / 0	| TaskKilled (killed intentionally)|

Author: Devaraj K <devaraj@apache.org>

Closes #16725 from devaraj-kavali/SPARK-19377.

(cherry picked from commit df4a27cc)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

f9464641

Jan 26, 2017

[SPARK-19220][UI] Make redirection to HTTPS apply to all URIs. (branch-2.1) · 59502bbc

Marcelo Vanzin authored 8 years ago

The redirect handler was installed only for the root of the server;
any other context ended up being served directly through the HTTP
port. Since every sub page (e.g. application UIs in the history
server) is a separate servlet context, this meant that everything
but the root was accessible via HTTP still.

The change adds separate names to each connector, and binds contexts
to specific connectors so that content is only served through the
HTTPS connector when it's enabled. In that case, the only thing that
binds to the HTTP connector is the redirect handler.

Tested with new unit tests and by checking a live history server.

(cherry picked from commit d3dcb63b)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16711 from vanzin/SPARK-19220_2.1.

59502bbc

Jan 25, 2017

[SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD · 0d7e3852

Tathagata Das authored 8 years ago


## What changes were proposed in this pull request?

EdgeRDD/VertexRDD overrides checkpoint() and isCheckpointed() to forward these to the internal partitionRDD. So when checkpoint() is called on them, its the partitionRDD that actually gets checkpointed. However since isCheckpointed() also overridden to call partitionRDD.isCheckpointed, EdgeRDD/VertexRDD.isCheckpointed returns true even though this RDD is actually not checkpointed.

This would have been fine except the RDD's internal logic for computing the RDD depends on isCheckpointed(). So for VertexRDD/EdgeRDD, since isCheckpointed is true, when computing Spark tries to read checkpoint data of VertexRDD/EdgeRDD even though they are not actually checkpointed. Through a crazy sequence of call forwarding, it reads checkpoint data of partitionsRDD and tries to cast it to types in Vertex/EdgeRDD. This leads to ClassCastException.

The minimal fix that does not change any public behavior is to modify RDD internal to not use public override-able API for internal logic.
## How was this patch tested?

New unit tests.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #15396 from tdas/SPARK-14804.

(cherry picked from commit 47d5d0dd)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

0d7e3852

Jan 23, 2017

[SPARK-19306][CORE] Fix inconsistent state in DiskBlockObject when expection occurred · ed5d1e72

jerryshao authored 8 years ago


## What changes were proposed in this pull request?

In `DiskBlockObjectWriter`, when some errors happened during writing, it will call `revertPartialWritesAndClose`, if this method again failed due to some issues like out of disk, it will throw exception without resetting the state of this writer, also skipping the revert. So here propose to fix this issue to offer user a chance to recover from such issue.

## How was this patch tested?

Existing test.

Author: jerryshao <sshao@hortonworks.com>

Closes #16657 from jerryshao/SPARK-19306.

(cherry picked from commit e4974721)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

ed5d1e72