Commits · 73f065569d352081b7d64c254af70ce996860c53 · cs525-sp18-g07 / spark

Feb 09, 2017

[SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt · 303f00a4

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

`Signaling.cancelOnInterrupt` leaks a SparkContext per call and it makes ReplSuite unstable.

This PR adds `SparkContext.getActive` to allow `Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the leak.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16825 from zsxwing/SPARK-19481.

303f00a4

Dec 03, 2016

[SPARK-18685][TESTS] Fix URI and release resources after opening in tests at... · d1312fb7

hyukjinkwon authored 8 years ago

[SPARK-18685][TESTS] Fix URI and release resources after opening in tests at ExecutorClassLoaderSuite

## What changes were proposed in this pull request?

This PR fixes two problems as below:

- Close `BufferedSource` after `Source.fromInputStream(...)` to release resource and make the tests pass on Windows in `ExecutorClassLoaderSuite`

  ```
  [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.repl.ExecutorClassLoaderSuite *** ABORTED *** (7 seconds, 333 milliseconds)
  [info]   java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-77b2f37b-6405-47c4-af1c-4a6a206511f2
  [info]   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
  [info]   at org.apache.spark.repl.ExecutorClassLoaderSuite.afterAll(ExecutorClassLoaderSuite.scala:76)
  [info]   at org.scalatest.BeforeAndAfterAll$class.afterAll(BeforeAndAfterAll.scala:213)
  ...
  ```

- Fix URI correctly so that related tests can be passed on Windows.

  ```
  [info] - child first *** FAILED *** (78 milliseconds)
  [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
  [info]   at java.net.URI$Parser.fail(URI.java:2848)
  [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
  ...
  [info] - parent first *** FAILED *** (15 milliseconds)
  [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
  [info]   at java.net.URI$Parser.fail(URI.java:2848)
  [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
  ...
  [info] - child first can fall back *** FAILED *** (0 milliseconds)
  [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
  [info]   at java.net.URI$Parser.fail(URI.java:2848)
  [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
  ...
  [info] - child first can fail *** FAILED *** (0 milliseconds)
  [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
  [info]   at java.net.URI$Parser.fail(URI.java:2848)
  [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
  ...
  [info] - resource from parent *** FAILED *** (0 milliseconds)
  [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
  [info]   at java.net.URI$Parser.fail(URI.java:2848)
  [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
  ...
  [info] - resources from parent *** FAILED *** (0 milliseconds)
  [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
  [info]   at java.net.URI$Parser.fail(URI.java:2848)
  [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
  ```

## How was this patch tested?

Manually tested via AppVeyor.

**Before**
https://ci.appveyor.com/project/spark-test/spark/build/102-rpel-ExecutorClassLoaderSuite

**After**
https://ci.appveyor.com/project/spark-test/spark/build/108-rpel-ExecutorClassLoaderSuite

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16116 from HyukjinKwon/close-after-open.

Unverified

d1312fb7

Aug 17, 2016

[SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18

Steve Loughran authored 8 years ago

A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.

1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
1. any `FileSystem.exists()` check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.

Initially, relying on Jenkins test runs.

One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.

Author: Steve Loughran <stevel@apache.org>

Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.

cc97ea18

Aug 08, 2016

[SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d

Holden Karau authored 8 years ago

[SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting

## What changes were proposed in this pull request?

Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.

## How was this patch tested?

Existing tests.

Author: Holden Karau <holden@us.ibm.com>

Closes #14407 from holdenk/SPARK-16779.

9216901d

May 31, 2016

[SPARK-15622][SQL] Wrap the parent classloader of Janino's classloader in the ParentClassLoader. · c6de5832

Yin Huai authored 8 years ago

## What changes were proposed in this pull request?
At https://github.com/aunkrig/janino/blob/janino_2.7.8/janino/src/org/codehaus/janino/ClassLoaderIClassLoader.java#L80-L85, Janino's classloader throws the exception when its parent throws a ClassNotFoundException with a cause set. However, it does not throw the exception when there is no cause set. Seems we need to use a special ClassLoader to wrap the actual parent classloader set to Janino handle this behavior.

## How was this patch tested?
I have reverted the workaround made by https://issues.apache.org/jira/browse/SPARK-11636 ( https://github.com/apache/spark/compare/master...yhuai:SPARK-15622?expand=1#diff-bb538fda94224dd0af01d0fd7e1b4ea0R81) and `test-only *ReplSuite -- -z "SPARK-2576 importing implicits"` still passes the test (without the change in `CodeGenerator`, this test does not pass with the change in `ExecutorClassLoader `).

Author: Yin Huai <yhuai@databricks.com>

Closes #13366 from yhuai/SPARK-15622.

c6de5832

Apr 22, 2016

[SPARK-10001] Consolidate Signaling and SignalLogger. · c089c6f4

Reynold Xin authored 8 years ago

## What changes were proposed in this pull request?
This is a follow-up to #12557, with the following changes:

1. Fixes some of the style issues.
2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other.
3. Made logging registration idempotent.

## How was this patch tested?
N/A.

Author: Reynold Xin <rxin@databricks.com>

Closes #12605 from rxin/SPARK-10001.

c089c6f4

[SPARK-10001] [CORE] Interrupt tasks in repl with Ctrl+C · 80127935

Jakob Odersky authored 8 years ago

## What changes were proposed in this pull request?

Improve signal handling to allow interrupting running tasks from the REPL (with Ctrl+C).
If no tasks are running or Ctrl+C is pressed twice, the signal is forwarded to the default handler resulting in the usual termination of the application.

This PR is a rewrite of -- and therefore closes #8216 -- as per piaozhexiu's request

## How was this patch tested?
Signal handling is not easily testable therefore no unit tests were added. Nevertheless, the new functionality is implemented in a best-effort approach, soft-failing in case signals aren't available on a specific OS.

Author: Jakob Odersky <jakob@odersky.com>

Closes #12557 from jodersky/SPARK-10001-sigint.

80127935

Apr 20, 2016

[SPARK-14725][CORE] Remove HttpServer class · 90cbc82f

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

This proposal removes the class `HttpServer`, with the changing of internal file/jar/class transmission to RPC layer, currently there's no code using this `HttpServer`, so here propose to remove it.

## How was this patch tested?

Unit test is verified locally.

Author: jerryshao <sshao@hortonworks.com>

Closes #12526 from jerryshao/SPARK-14725.

90cbc82f

Apr 12, 2016

[SPARK-14508][BUILD] Add a new ScalaStyle Rule `OmitBracesInCase` · b0f5497e

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [Scala Style Guide](http://docs.scala-lang.org/style/control-structures.html#curlybraces), we had better enforce the following rule.
  ```
  case: Always omit braces in case clauses.
  ```
This PR makes a new ScalaStyle rule, 'OmitBracesInCase', and enforces it to the code.

## How was this patch tested?

Pass the Jenkins tests (including Scala style checking)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12280 from dongjoon-hyun/SPARK-14508.

b0f5497e

Apr 06, 2016

[SPARK-14134][CORE] Change the package name used for shading classes. · 21d5ca12

Marcelo Vanzin authored 8 years ago

The current package name uses a dash, which is a little weird but seemed
to work. That is, until a new test tried to mock a class that references
one of those shaded types, and then things started failing.

Most changes are just noise to fix the logging configs.

For reference, SPARK-8815 also raised this issue, although at the time it
did not cause any issues in Spark, so it was not addressed.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11941 from vanzin/SPARK-14134.

21d5ca12

Mar 17, 2016

[SPARK-13928] Move org.apache.spark.Logging into org.apache.spark.internal.Logging · 8ef3399a

Wenchen Fan authored 9 years ago

## What changes were proposed in this pull request?

Logging was made private in Spark 2.0. If we move it, then users would be able to create a Logging trait themselves to avoid changing their own code.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #11764 from cloud-fan/logger.

8ef3399a

Jan 05, 2016
- [SPARK-3873][TESTS] Import ordering fixes. · b3ba1be3
  Marcelo Vanzin authored 9 years ago
  
  Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
  b3ba1be3
Dec 31, 2015

[SPARK-3873][STREAMING] Import order fixes for streaming. · efb10cc9

Marcelo Vanzin authored 9 years ago

Also included a few miscelaneous other modules that had very few violations.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10532 from vanzin/SPARK-3873-streaming.

efb10cc9

Dec 24, 2015

[SPARK-12311][CORE] Restore previous value of "os.arch" property in test... · 39204661

Kazuaki Ishizaki authored 9 years ago

[SPARK-12311][CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

Restore the original value of os.arch property after each test

Since some of tests forced to set the specific value to os.arch property, we need to set the original value.

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #10289 from kiszk/SPARK-12311.

39204661

Dec 18, 2015

[SPARK-12350][CORE] Don't log errors when requested stream is not found. · 27828182

Marcelo Vanzin authored 9 years ago

If a client requests a non-existent stream, just send a failure message
back, without logging any error on the server side (since it's not a
server error).

On the executor side, avoid error logs by translating any errors during
transfer to a `ClassNotFoundException`, so that loading the class is
retried on a the parent class loader. This can mask IO errors during
transmission, but the most common cause is that the class is not
served by the remote end.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10337 from vanzin/SPARK-12350.

27828182

Dec 10, 2015

[SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes. · 4a46b885

Marcelo Vanzin authored 9 years ago

This avoids bringing up yet another HTTP server on the driver, and
instead reuses the file server already managed by the driver's
RpcEnv. As a bonus, the repl now inherits the security features of
the network library.

There's also a small change to create the directory for storing classes
under the root temp dir for the application (instead of directly
under java.io.tmpdir).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9923 from vanzin/SPARK-11563.

4a46b885

Nov 24, 2015

[SPARK-11818][REPL] Fix ExecutorClassLoader to lookup resources from … · be9dd155

Jungtaek Lim authored 9 years ago

…parent class loader

Without patch, two additional tests of ExecutorClassLoaderSuite fails.

- "resource from parent"
- "resources from parent"

Detailed explanation is here, https://issues.apache.org/jira/browse/SPARK-11818?focusedCommentId=15011202&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15011202

Author: Jungtaek Lim <kabhwan@gmail.com>

Closes #9812 from HeartSaVioR/SPARK-11818.

be9dd155

Nov 20, 2015

[SPARK-11636][SQL] Support classes defined in the REPL with Encoders · 4b84c72d

Michael Armbrust authored 9 years ago

#theScaryParts (i.e. changes to the repl, executor classloaders and codegen)...

Author: Michael Armbrust <michael@databricks.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #9825 from marmbrus/dataset-replClasses2.

4b84c72d

Nov 11, 2015

[SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes · 529a1d33

Josh Rosen authored 9 years ago

This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8.

In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml.

http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes.

I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9512 from JoshRosen/SPARK-6152.

529a1d33

May 29, 2015

[SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1

Andrew Or authored 9 years ago

Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:

```
===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====

15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents

...

15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:

***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****

...
```

Author: Andrew Or <andrew@databricks.com>

Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:

879b060 [Andrew Or] Fix compile after rebase
d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
7790b6c [Andrew Or] Fix tests after logical merge conflict
c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
c43ffc4 [Andrew Or] Fix tests?
8882581 [Andrew Or] Fix tests
ee22cda [Andrew Or] Fix log message
fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
12d1e1b [Andrew Or] Various whitespace changes (minor)
69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
da0b12f [Andrew Or] Add core tests as dependencies in all modules
f7d29ce [Andrew Or] Introduce base abstract class for all test suites

9eb222c1

Apr 09, 2015

[SPARK-6758]block the right jetty package in log · 7d92db34

WangTaoTheTonic authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-6758

I am not sure if it is ok to block them in test resources too (as we shade jetty in assembly?).

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #5406 from WangTaoTheTonic/SPARK-6758 and squashes the following commits:

e09605b [WangTaoTheTonic] block the right jetty package

7d92db34

Mar 24, 2015

[SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load... · 7215aa74

Josh Rosen authored 10 years ago

[SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)

ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.

This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.

This PR is an extended version of #4935 and adds a regression test.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4944 from JoshRosen/executorclassloader-leak-master-branch and squashes the following commits:

e0e3c25 [Josh Rosen] Wrap try block around getReponseCode; re-enable keep-alive by closing error stream
961c284 [Josh Rosen] Roll back changes that were added to get the regression test to fail
7ee2261 [Josh Rosen] Add a failing regression test
e2d70a3 [Josh Rosen] Properly clean up after errors in ExecutorClassLoader

7215aa74

Feb 02, 2015

Spark 3883: SSL support for HttpServer and Akka · cfea3003

Jacek Lewandowski authored 10 years ago

SPARK-3883: SSL support for Akka connections and Jetty based file servers.

This story introduced the following changes:
- Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace.
- SSLOptions is created and kept by SecurityManager
- All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp
- Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager
- Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf.
- Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker

Refer to https://github.com/apache/spark/pull/3571 for discussion and details

Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
Author: Jacek Lewandowski <jacek.lewandowski@datastax.com>

Closes #3571 from jacek-lewandowski/SPARK-3883-master and squashes the following commits:

9ef4ed1 [Jacek Lewandowski] Merge pull request #2 from jacek-lewandowski/SPARK-3883-docs2
fb31b49 [Jacek Lewandowski] SPARK-3883: Added SSL setup documentation
2532668 [Jacek Lewandowski] SPARK-3883: Refactored AkkaUtils.protocol method to not use Try
90a8762 [Jacek Lewandowski] SPARK-3883: Refactored methods to resolve Akka address and made it possible to easily configure multiple communication layers for SSL
72b2541 [Jacek Lewandowski] SPARK-3883: A reference to the fallback SSLOptions can be provided when constructing SSLOptions
93050f4 [Jacek Lewandowski] SPARK-3883: SSL support for HttpServer and Akka

cfea3003

Feb 01, 2015

[SPARK-5353] Log failures in REPL class loading · 9f0a6e18

Tobias Schlatter authored 10 years ago

Author: Tobias Schlatter <tobias@meisch.ch>

Closes #4130 from gzm0/log-repl-loading and squashes the following commits:

4fa0582 [Tobias Schlatter] Log failures in REPL class loading

9f0a6e18

Jan 06, 2015

SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4

Sean Owen authored 10 years ago

This PR:

- Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
- Tells `surefire` to test only Java tests
- Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.

For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.

Author: Sean Owen <sowen@cloudera.com>

Closes #3651 from srowen/SPARK-4159 and squashes the following commits:

2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent

4cba6eb4

Nov 11, 2014

Support cross building for Scala 2.11 · daaca14c

Prashant Sharma authored 10 years ago

Let's give this another go using a version of Hive that shades its JLine dependency.

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>

Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:

e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
2121071 [Patrick Wendell] Migrating version detection to PySpark
b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
f5cad4e [Patrick Wendell] Add Scala 2.11 docs
210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
e22b104 [Patrick Wendell] Small fix in pom file
ec402ab [Patrick Wendell] Various fixes
0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
4eaec65 [Prashant Sharma] Changed scripts to ignore target.
5167bea [Prashant Sharma] small correction
a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
cb059b0 [Prashant Sharma] Code review
0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.

daaca14c

Oct 09, 2014

SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Utils.createTempDir · 363baaca

Sean Owen authored 10 years ago

I noticed a few issues with how temp directories are created and deleted:

*Minor*

* Guava's `Files.createTempDir()` plus `File.deleteOnExit()` is used in many tests to make a temp dir, but `Utils.createTempDir()` seems to be the standard Spark mechanism
* Call to `File.deleteOnExit()` could be pushed into `Utils.createTempDir()` as well, along with this replacement
* _I messed up the message in an exception in `Utils` in SPARK-3794; fixed here_

*Bit Less Minor*

* `Utils.deleteRecursively()` fails immediately if any `IOException` occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end.
* `Utils.createTempDir()` will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However `Utils` manages a set of all dirs to delete on shutdown already, called `shutdownDeletePaths`. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in `TachyonBlockManager`.

I noticed a few other things that might be changed but wanted to ask first:

* Shouldn't the set of dirs to delete be `File`, not just `String` paths?
* `Utils` manages the set of `TachyonFile` that have been registered for deletion, but the shutdown hook is managed in `TachyonBlockManager`. Should this logic not live together, and not in `Utils`? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place.

Author: Sean Owen <sowen@cloudera.com>

Closes #2670 from srowen/SPARK-3811 and squashes the following commits:

071ae60 [Sean Owen] Update per @vanzin's review
da0146d [Sean Owen] Make Utils.deleteRecursively try to delete all paths even when an exception occurs; use one shutdown hook instead of one per method call to delete temp dirs
3a0faa4 [Sean Owen] Standardize on Utils.createTempDir instead of Files.createTempDir

363baaca

Oct 08, 2014

[SPARK-3836] [REPL] Spark REPL optionally propagate internal exceptions · c7818434

Ahir Reddy authored 10 years ago

Optionally have the repl throw exceptions generated by interpreted code, instead of swallowing the exception and returning it as text output. This is useful when embedding the repl, otherwise it's not possible to know when user code threw an exception.

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes #2695 from ahirreddy/repl-throw-exceptions and squashes the following commits:

bad25ee [Ahir Reddy] Style Fixes
f0e5b44 [Ahir Reddy] Fixed style
0d4413d [Ahir Reddy] propogate excetions from repl

c7818434

Oct 01, 2014

[SPARK-3748] Log thread name in unit test logs · 3888ee2f

Reynold Xin authored 10 years ago

Thread names are useful for correlating failures.

Author: Reynold Xin <rxin@apache.org>

Closes #2600 from rxin/log4j and squashes the following commits:

83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs

3888ee2f

Sep 11, 2014

SPARK-2482: Resolve sbt warnings during build · 33c7a738

witgo authored 10 years ago

At the same time, import the `scala.language.postfixOps` and ` org.scalatest.time.SpanSugar._` cause `scala.language.postfixOps` doesn't work

Author: witgo <witgo@qq.com>

Closes #1330 from witgo/sbt_warnings3 and squashes the following commits:

179ba61 [witgo] Resolve sbt warnings during build

33c7a738

Sep 06, 2014

[SPARK-3273][SPARK-3301]We should read the version information from the same place · 21a1e1bb

GuoQiang Li authored 10 years ago

Author: GuoQiang Li <witgo@qq.com>

Closes #2175 from witgo/SPARK-3273 and squashes the following commits:

cf9c65a [GuoQiang Li] We should read the version information from the same place
2a44e2f [GuoQiang Li] The spark version in the welcome message of pyspark is not correct

21a1e1bb

Sep 02, 2014

[SPARK-1919] Fix Windows spark-shell --jars · 8f1f9aaf

Andrew Or authored 10 years ago

We were trying to add `file:/C:/path/to/my.jar` to the class path. We should add `C:/path/to/my.jar` instead. Tested on Windows 8.1.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2211 from andrewor14/windows-shell-jars and squashes the following commits:

262c6a2 [Andrew Or] Oops... Add the new code to the correct place
0d5a0c1 [Andrew Or] Format jar path only for adding to shell classpath
42bd626 [Andrew Or] Remove unnecessary code
0049f1b [Andrew Or] Remove embarrassing log messages
b1755a0 [Andrew Or] Format jar paths properly before adding them to the classpath

8f1f9aaf

Aug 30, 2014

[SPARK-2889] Create Hadoop config objects consistently. · b6cf1348

Marcelo Vanzin authored 10 years ago

Different places in the code were instantiating Configuration / YarnConfiguration objects in different ways. This could lead to confusion for people who actually expected "spark.hadoop.*" options to end up in the configs used by Spark code, since that would only happen for the SparkContext's config.

This change modifies most places to use SparkHadoopUtil to initialize configs, and make that method do the translation that previously was only done inside SparkContext.

The places that were not changed fall in one of the following categories:
- Test code where this doesn't really matter
- Places deep in the code where plumbing SparkConf would be too difficult for very little gain
- Default values for arguments - since the caller can provide their own config in that case

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #1843 from vanzin/SPARK-2889 and squashes the following commits:

52daf35 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
f179013 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
51e71cf [Marcelo Vanzin] Add test to ensure that overriding Yarn configs works.
53f9506 [Marcelo Vanzin] Add DeveloperApi annotation.
3d345cb [Marcelo Vanzin] Restore old method for backwards compat.
fc45067 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
0ac3fdf [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
3f26760 [Marcelo Vanzin] Compilation fix.
f16cadd [Marcelo Vanzin] Initialize config in SparkHadoopUtil.
b8ab173 [Marcelo Vanzin] Update Utils API to take a Configuration argument.
1e7003f [Marcelo Vanzin] Replace explicit Configuration instantiation with SparkHadoopUtil.

b6cf1348

Aug 27, 2014

[SPARK-3256] Added support for :cp <jar> that was broken in Scala 2.10.x for REPL · 191d7cf2

Chip Senkbeil authored 10 years ago

As seen with [SI-6502](https://issues.scala-lang.org/browse/SI-6502) of Scala, the _:cp_ command was broken in Scala 2.10.x. As the Spark shell is a friendly wrapper on top of the Scala REPL, it is also affected by this problem.

My solution was to alter the internal classpath and invalidate any new entries. I also had to add the ability to add new entries to the parent classloader of the interpreter (SparkIMain's global).

The advantage of this versus wiping the interpreter and replaying all of the commands is that you don't have to worry about rerunning heavy Spark-related commands (going to the cluster) or potentially reloading data that might have changed. Instead, you get to work from where you left off.

Until this is fixed upstream for 2.10.x, I had to use reflection to alter the internal compiler classpath.

The solution now looks like this:
![screen shot 2014-08-13 at 3 46 02 pm](https://cloud.githubusercontent.com/assets/2481802/3912625/f02b1440-232c-11e4-9bf6-bafb3e352d14.png)

Author: Chip Senkbeil <rcsenkbe@us.ibm.com>

Closes #1929 from rcsenkbeil/FixReplClasspathSupport and squashes the following commits:

f420cbf [Chip Senkbeil] Added SparkContext.addJar calls to support executing code on remote clusters
a826795 [Chip Senkbeil] Updated AddUrlsToClasspath to use 'new Run' suggestion over hackish compiler error
2ff1d86 [Chip Senkbeil] Added compilation failure on symbols hack to get Scala classes to load correctly
a220639 [Chip Senkbeil] Added support for :cp <jar> that was broken in Scala 2.10.x for REPL

191d7cf2

Aug 06, 2014

[SPARK-2157] Enable tight firewall rules for Spark · 09f7e458

Andrew Or authored 10 years ago

The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107.

The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.

My spark-env.sh looks like this:
```
export SPARK_MASTER_PORT=6060
export SPARK_WORKER_PORT=7070
export SPARK_MASTER_WEBUI_PORT=9090
export SPARK_WORKER_WEBUI_PORT=9091
```
and my spark-defaults.conf looks like this:
```
spark.master spark://andrews-mbp:6060
spark.driver.port 5001
spark.fileserver.port 5011
spark.broadcast.port 5021
spark.replClassServer.port 5031
spark.blockManager.port 5041
spark.executor.port 5051
```

Author: Andrew Or <andrewor14@gmail.com>
Author: Andrew Ash <andrew@andrewash.com>

Closes #1777 from andrewor14/configure-ports and squashes the following commits:

621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
8a6b820 [Andrew Or] Use a random UI port during tests
7da0493 [Andrew Or] Fix tests
523c30e [Andrew Or] Add test for isBindCollision
b97b02a [Andrew Or] Minor fixes
c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
93d359f [Andrew Or] Executors connect to wrong port when collision occurs
d502e5f [Andrew Or] Handle port collisions when creating Akka systems
a2dd05c [Andrew Or] Patrick's comment nit
86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
e837cde [Andrew Or] Remove outdated TODOs
bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
de1b207 [Andrew Or] Update docs to reflect new ports
b565079 [Andrew Or] Add spark.ports.maxRetries
2551eb2 [Andrew Or] Remove spark.worker.watcher.port
151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
9868358 [Andrew Or] Add a few miscellaneous ports
6016e77 [Andrew Or] Add spark.executor.port
8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
4d9e6f3 [Andrew Or] Fix super subtle bug
3f8e51b [Andrew Or] Correct erroneous docs...
e111d08 [Andrew Or] Add names for UI services
470f38c [Andrew Or] Special case non-"Address already in use" exceptions
1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
ba32280 [Andrew Or] Minor fixes
6b550b0 [Andrew Or] Assorted fixes
73fbe89 [Andrew Or] Move start service logic to Utils
ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
038a579 [Andrew Ash] Trust the server start function to report the port the service started on
7c5bdc4 [Andrew Ash] Fix style issue
0347aef [Andrew Ash] Unify port fallback logic to a single place
24a4c32 [Andrew Ash] Remove type on val to match surrounding style
9e4ad96 [Andrew Ash] Reformat for style checker
5d84e0e [Andrew Ash] Document new port configuration options
066dc7a [Andrew Ash] Fix up HttpServer port increments
cad16da [Andrew Ash] Add fallover increment logic for HttpServer
c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
b80d2fd [Andrew Ash] Make Spark's block manager port configurable
17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
1c0981a [Andrew Ash] Make port in HttpServer configurable

09f7e458

Aug 02, 2014

[SPARK-2454] Do not ship spark home to Workers · 148af608

Andrew Or authored 10 years ago

When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same directory structure as the driver node, the Workers will attempt to run scripts (e.g. bin/compute-classpath.sh) that do not exist locally and fail. This is a common scenario if the driver is launched from outside of the cluster.

The solution is to simply not pass the driver's Spark home to the Workers. This PR further makes an attempt to avoid overloading the usages of `spark.home`, which is now only used for setting executor Spark home on Mesos and in python.

This is based on top of #1392 and originally reported by YanTangZhai. Tested on standalone cluster.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1734 from andrewor14/spark-home-reprise and squashes the following commits:

f71f391 [Andrew Or] Revert changes in python
1c2532c [Andrew Or] Merge branch 'master' of github.com:apache/spark into spark-home-reprise
188fc5d [Andrew Or] Avoid using spark.home where possible
09272b7 [Andrew Or] Always use Worker's working directory as spark home

148af608

Aug 01, 2014

SPARK-2632, SPARK-2576. Fixed by only importing what is necessary during class definition. · 14991011

Prashant Sharma authored 10 years ago

Without this patch, it imports everything available in the scope.

```scala

scala> val a = 10l
val a = 10l
a: Long = 10

scala> import a._
import a._
import a._

scala> case class A(a: Int) // show
case class A(a: Int) // show
class $read extends Serializable {
  def <init>() = {
    super.<init>;
    ()
  };
  class $iwC extends Serializable {
    def <init>() = {
      super.<init>;
      ()
    };
    class $iwC extends Serializable {
      def <init>() = {
        super.<init>;
        ()
      };
      import org.apache.spark.SparkContext._;
      class $iwC extends Serializable {
        def <init>() = {
          super.<init>;
          ()
        };
        val $VAL5 = $line5.$read.INSTANCE;
        import $VAL5.$iw.$iw.$iw.$iw.a;
        class $iwC extends Serializable {
          def <init>() = {
            super.<init>;
            ()
          };
          import a._;
          class $iwC extends Serializable {
            def <init>() = {
              super.<init>;
              ()
            };
            class $iwC extends Serializable {
              def <init>() = {
                super.<init>;
                ()
              };
              case class A extends scala.Product with scala.Serializable {
                <caseaccessor> <paramaccessor> val a: Int = _;
                def <init>(a: Int) = {
                  super.<init>;
                  ()
                }
              }
            };
            val $iw = new $iwC.<init>
          };
          val $iw = new $iwC.<init>
        };
        val $iw = new $iwC.<init>
      };
      val $iw = new $iwC.<init>
    };
    val $iw = new $iwC.<init>
  };
  val $iw = new $iwC.<init>
}
object $read extends scala.AnyRef {
  def <init>() = {
    super.<init>;
    ()
  };
  val INSTANCE = new $read.<init>
}
defined class A
```

With this patch, it just imports  only the necessary.

```scala

scala> val a = 10l
val a = 10l
a: Long = 10

scala> import a._
import a._
import a._

scala> case class A(a: Int) // show
case class A(a: Int) // show
class $read extends Serializable {
  def <init>() = {
    super.<init>;
    ()
  };
  class $iwC extends Serializable {
    def <init>() = {
      super.<init>;
      ()
    };
    class $iwC extends Serializable {
      def <init>() = {
        super.<init>;
        ()
      };
      case class A extends scala.Product with scala.Serializable {
        <caseaccessor> <paramaccessor> val a: Int = _;
        def <init>(a: Int) = {
          super.<init>;
          ()
        }
      }
    };
    val $iw = new $iwC.<init>
  };
  val $iw = new $iwC.<init>
}
object $read extends scala.AnyRef {
  def <init>() = {
    super.<init>;
    ()
  };
  val INSTANCE = new $read.<init>
}
defined class A

scala>

```

This patch also adds a `:fallback` mode on being enabled it will restore the spark-shell's 1.0.0 behaviour.

Author: Prashant Sharma <scrapcodes@gmail.com>
Author: Yin Huai <huai@cse.ohio-state.edu>
Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1635 from ScrapCodes/repl-fix-necessary-imports and squashes the following commits:

b1968d2 [Prashant Sharma] Added toschemaRDD to test case.
0b712bb [Yin Huai] Add a REPL test to test importing a method.
02ad8ff [Yin Huai] Add a REPL test for importing SQLContext.createSchemaRDD.
ed6d0c7 [Prashant Sharma] Added a fallback mode, incase users run into issues while using repl.
b63d3b2 [Prashant Sharma] SPARK-2632, SPARK-2576. Fixed by only importing what is necessary during class definition.

14991011

Jul 31, 2014

[SPARK-2762] SparkILoop leaks memory in multi-repl configurations · 92ca910e

Timothy Hunter authored 10 years ago

This pull request is a small refactor so that a partial function (hence a closure) is not created. Instead, a regular function is used. The behavior of the code is not changed.

Author: Timothy Hunter <timhunter@databricks.com>

Closes #1674 from thunterdb/closure_issue and squashes the following commits:

e1e664d [Timothy Hunter] simplify closure

92ca910e

Jul 22, 2014

[SPARK-2452] Create a new valid for each instead of using lineId. · 81fec992

Prashant Sharma authored 10 years ago

Author: Prashant Sharma <prashant@apache.org>

Closes #1441 from ScrapCodes/SPARK-2452/multi-statement and squashes the following commits:

26c5c72 [Prashant Sharma] Added a test case.
7e8d28d [Prashant Sharma] SPARK-2452, create a new valid for each instead of using lineId, because Line ids can be same sometimes.

81fec992

Jul 04, 2014

[SPARK-1199][REPL] Remove VALId and use the original import style for defined classes. · d4341507

Prashant Sharma authored 10 years ago

This is an alternate solution to #1176.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1179 from ScrapCodes/SPARK-1199/repl-fix-second-approach and squashes the following commits:

820b34b [Prashant Sharma] Here we generate two kinds of import wrappers based on whether it is a class or not.

d4341507