Commits · 6ca990fb366cf68cd9d5afb433725d28f07e51a0 · cs525-sp18-g07 / spark

Mar 11, 2016

[SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly · 6ca990fb

Josh Rosen authored 9 years ago

This patch removes the need to build a full Spark assembly before running the `dev/mima` script.

- I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
   - This required me to delete two classes full of dead code that we don't use anymore
- `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
- `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11178 from JoshRosen/remove-assembly-in-run-tests.

6ca990fb

Mar 10, 2016

[SPARK-13244][SQL] Migrates DataFrame to Dataset · 1d542785

Cheng Lian authored 9 years ago

## What changes were proposed in this pull request?

This PR unifies DataFrame and Dataset by migrating existing DataFrame operations to Dataset and make `DataFrame` a type alias of `Dataset[Row]`.

Most Scala code changes are source compatible, but Java API is broken as Java knows nothing about Scala type alias (mostly replacing `DataFrame` with `Dataset<Row>`).

There are several noticeable API changes related to those returning arrays:

1.  `collect`/`take`

    -   Old APIs in class `DataFrame`:

        ```scala
        def collect(): Array[Row]
        def take(n: Int): Array[Row]
        ```

    -   New APIs in class `Dataset[T]`:

        ```scala
        def collect(): Array[T]
        def take(n: Int): Array[T]

        def collectRows(): Array[Row]
        def takeRows(n: Int): Array[Row]
        ```

    Two specialized methods `collectRows` and `takeRows` are added because Java doesn't support returning generic arrays. Thus, for example, `DataFrame.collect(): Array[T]` actually returns `Object` instead of `Array<T>` from Java side.

    Normally, Java users may fall back to `collectAsList` and `takeAsList`.  The two new specialized versions are added to avoid performance regression in ML related code (but maybe I'm wrong and they are not necessary here).

1.  `randomSplit`

    -   Old APIs in class `DataFrame`:

        ```scala
        def randomSplit(weights: Array[Double], seed: Long): Array[DataFrame]
        def randomSplit(weights: Array[Double]): Array[DataFrame]
        ```

    -   New APIs in class `Dataset[T]`:

        ```scala
        def randomSplit(weights: Array[Double], seed: Long): Array[Dataset[T]]
        def randomSplit(weights: Array[Double]): Array[Dataset[T]]
        ```

    Similar problem as above, but hasn't been addressed for Java API yet.  We can probably add `randomSplitAsList` to fix this one.

1.  `groupBy`

    Some original `DataFrame.groupBy` methods have conflicting signature with original `Dataset.groupBy` methods.  To distinguish these two, typed `Dataset.groupBy` methods are renamed to `groupByKey`.

Other noticeable changes:

1.  Dataset always do eager analysis now

    We used to support disabling DataFrame eager analysis to help reporting partially analyzed malformed logical plan on analysis failure.  However, Dataset encoders requires eager analysi during Dataset construction.  To preserve the error reporting feature, `AnalysisException` now takes an extra `Option[LogicalPlan]` argument to hold the partially analyzed plan, so that we can check the plan tree when reporting test failures.  This plan is passed by `QueryExecution.assertAnalyzed`.

## How was this patch tested?

Existing tests do the work.

## TODO

- [ ] Fix all tests
- [ ] Re-enable MiMA check
- [ ] Update ScalaDoc (`since`, `group`, and example code)

Author: Cheng Lian <lian@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Author: Wenchen Fan <wenchen@databricks.com>
Author: Cheng Lian <liancheng@users.noreply.github.com>

Closes #11443 from liancheng/ds-to-df.

1d542785

[SPARK-13663][CORE] Upgrade Snappy Java to 1.1.2.1 · 927e22ef

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around.
Supersedes https://github.com/apache/spark/pull/11524

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #11631 from srowen/SPARK-13663.

927e22ef

Mar 09, 2016

[SPARK-13595][BUILD] Move docker, extras modules into external · 256704c7

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`

## How was this patch tested?

This is tested with Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #11523 from srowen/SPARK-13595.

256704c7

Mar 08, 2016

[HOT-FIX][BUILD] Use the new location of `checkstyle-suppressions.xml` · 7771c731

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR fixes `dev/lint-java` and `mvn checkstyle:check` failures due the recent file location change.
The following is the error message of current master.
```
Checkstyle checks failed at following occurrences:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Failed during checkstyle configuration: cannot initialize module SuppressionFilter - Cannot set property 'file' to 'checkstyle-suppressions.xml' in module SuppressionFilter: InvocationTargetException: Unable to find: checkstyle-suppressions.xml -> [Help 1]
```

## How was this patch tested?

Manual. The following command should run correctly.
```
./dev/lint-java
mvn checkstyle:check
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11567 from dongjoon-hyun/hotfix_checkstyle_suppression.

7771c731

Mar 07, 2016

[SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.

I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.

## How was this patch tested?

`./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.

Author: Sean Owen <sowen@cloudera.com>

Closes #11522 from srowen/SPARK-13596.

0eea12a3

Mar 03, 2016

[MINOR] Fix typos in comments and testcase name of code · 941b270b

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR fixes typos in comments and testcase name of code.

## How was this patch tested?

manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.

941b270b

[SPARK-13599][BUILD] remove transitive groovy dependencies from Hive · 9a48c656

Steve Loughran authored 9 years ago

## What changes were proposed in this pull request?

Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.

This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.

## How was this patch tested?

1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
1. Patch applied
1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
1. Examined created spark-assembly, verified no org.codehaus packages
1. Verified that the maven dependency tree no longer references groovy

Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded

Author: Steve Loughran <stevel@hortonworks.com>

Closes #11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.

9a48c656

Mar 02, 2016

Fix run-tests.py typos · 75e618de

Wojciech Jurczyk authored 9 years ago

## What changes were proposed in this pull request?

The PR fixes typos in an error message in dev/run-tests.py.

Author: Wojciech Jurczyk <wojciech.jurczyk@codilime.com>

Closes #11467 from wjur/wjur/typos_run_tests.

75e618de

Mar 01, 2016

[BUILD][MINOR] Fix SBT build error with network-yarn module · b4d096de

jerryshao authored 9 years ago

## What changes were proposed in this pull request?

```
error] Expected ID character
[error] Not a valid command: common (similar: completions)
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Not a valid key: common (similar: commands)
[error] common/network-yarn/test
```

`common/network-yarn` is not a valid sbt project, we should change to `network-yarn`.

## How was this patch tested?

Locally run the the unit-test.

CC rxin , we should either change here, or change the sbt project name.

Author: jerryshao <sshao@hortonworks.com>

Closes #11456 from jerryshao/build-fix.

b4d096de

Feb 28, 2016

[SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6

Reynold Xin authored 9 years ago

## What changes were proposed in this pull request?
As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.

## How was this patch tested?
Compilation and existing tests. We should run both SBT and Maven.

Author: Reynold Xin <rxin@databricks.com>

Closes #11409 from rxin/SPARK-13529.

9e01dcc6

Feb 27, 2016

[SPARK-7483][MLLIB] Upgrade Chill to 0.7.2 to support Kryo with FPGrowth · ec0cc75e

mark800 authored 9 years ago

It registers more Scala classes, including ListBuffer to support Kryo with FPGrowth.

See https://github.com/twitter/chill/releases for Chill's change log.

Author: mark800 <yky800@126.com>

Closes #11041 from mark800/master.

ec0cc75e

Feb 26, 2016

[SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifacts to home.apache.org · f77dc4e1

Josh Rosen authored 9 years ago

Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.

f77dc4e1

Feb 17, 2016

[SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x · b8440486

Sean Owen authored 9 years ago

Phase 1: update plugin versions, test dependencies, some example and third-party versions

Author: Sean Owen <sowen@cloudera.com>

Closes #11206 from srowen/SPARK-13324.

b8440486

Feb 12, 2016

[SPARK-13154][PYTHON] Add linting for pydocs · 64515e5f

Holden Karau authored 9 years ago

We should have lint rules using sphinx to automatically catch the pydoc issues that are sometimes introduced.

Right now ./dev/lint-python will skip building the docs if sphinx isn't present - but it might make sense to fail hard - just a matter of if we want to insist all PySpark developers have sphinx present.

Author: Holden Karau <holden@us.ibm.com>

Closes #11109 from holdenk/SPARK-13154-add-pydoc-lint-for-docs.

64515e5f

Feb 09, 2016
- [SPARK-13189] Cleanup build references to Scala 2.10 · 2dbb9164
  Luciano Resende authored 9 years ago
  
  Author: Luciano Resende <lresende@apache.org> Closes #11092 from lresende/SPARK-13189.
  2dbb9164
Jan 30, 2016

[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2

Josh Rosen authored 9 years ago

This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).

The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).

After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10608 from JoshRosen/SPARK-6363.

289373b2

Jan 27, 2016

[SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_to_test() · 41f0c85f

Josh Rosen authored 9 years ago

There's a minor bug in how we handle the `root` module in the `modules_to_test()` function in `dev/run-tests.py`: since `root` now depends on `build` (since every test needs to run on any build test), we now need to check for the presence of root in `modules_to_test` instead of `changed_modules`.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10933 from JoshRosen/build-module-fix.

41f0c85f

Jan 26, 2016

[SPARK-8725][PROJECT-INFRA] Test modules in topologically-sorted order in dev/run-tests · ee74498d

Josh Rosen authored 9 years ago

This patch improves our `dev/run-tests` script to test modules in a topologically-sorted order based on modules' dependencies. This will help to ensure that bugs in upstream projects are not misattributed to downstream projects because those projects' tests were the first ones to exhibit the failure

Topological sorting is also useful for shortening the feedback loop when testing pull requests: if I make a change in SQL then the SQL tests should run before MLlib, not after.

In addition, this patch also updates our test module definitions to split `sql` into `catalyst`, `sql`, and `hive` in order to allow more tests to be skipped when changing only `hive/` files.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10885 from JoshRosen/SPARK-8725.

ee74498d

Jan 24, 2016

[SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python tools · a8340013

Holden Karau authored 9 years ago

Minor since so few people use them, but it would probably be good to have a requirements file for our python release tools for easier setup (also version pinning).

cc JoshRosen who looked at the original JIRA.

Author: Holden Karau <holden@us.ibm.com>

Closes #10871 from holdenk/SPARK-10498-add-requirements-file-for-dev-python-tools.

a8340013

Jan 23, 2016

[SPARK-12933][SQL] Initial implementation of Count-Min sketch · 1c690dda

Cheng Lian authored 9 years ago

This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1].

As required by the [design doc][2], spark-sketch should have no external dependency.
Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation.

The following features will be added in future follow-up PRs:

- Serialization support
- DataFrame API integration

[1]: https://github.com/addthis/stream-lib/blob/aac6b4d23a8686b000f80baa447e0922ecac3bcb/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java
[2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf

Author: Cheng Lian <lian@databricks.com>

Closes #10851 from liancheng/count-min-sketch.

1c690dda

Jan 22, 2016

[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming · bc1babd6

Shixiong Zhu authored 9 years ago

- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
- Update comments and docs

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10854 from zsxwing/remove-akka.

bc1babd6

Jan 20, 2016

[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60

Shixiong Zhu authored 9 years ago

Include the following changes:

1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10744 from zsxwing/streaming-akka-2.

b7d74a60

Jan 18, 2016
- Revert "[SPARK-12829] Turn Java style checker on" · 4bcea1b8
  Shixiong Zhu authored 9 years ago
  
  This reverts commit 591c88c9. `lint-java` doesn't work on a machine with a clean Maven cache.
  4bcea1b8
Jan 15, 2016

[SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile · 8dbbf3e7

Josh Rosen authored 9 years ago

This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.

/cc rxin srowen

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10775 from JoshRosen/add-hadoop-2.7-profile.

8dbbf3e7

[SPARK-12667] Remove block manager's internal "external block store" API · ad1503f9

Reynold Xin authored 9 years ago

This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems.

There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests.

Author: Reynold Xin <rxin@databricks.com>

Closes #10752 from rxin/remove-offheap.

ad1503f9

[SPARK-12833][SQL] Initial import of spark-csv · 5f83c699

Hossein authored 9 years ago

CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Having to rely on a 3rd party component for this leads to poor user experience for new users. This PR merges the popular spark-csv data source package (https://github.com/databricks/spark-csv) with SparkSQL.

This is a first PR to bring the functionality to spark 2.0 master. We will complete items outlines in the design document (see JIRA attachment) in follow up pull requests.

Author: Hossein <hossein@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #10766 from rxin/csv.

5f83c699

Jan 14, 2016

[SPARK-12829] Turn Java style checker on · 591c88c9

Reynold Xin authored 9 years ago

It was previously turned off because there was a problem with a pull request. We should turn it on now.

Author: Reynold Xin <rxin@databricks.com>

Closes #10763 from rxin/SPARK-12829.

591c88c9

[SPARK-12821][BUILD] Style checker should run when some configuration files... · bcc7373f

Kousuke Saruta authored 9 years ago

[SPARK-12821][BUILD] Style checker should run when some configuration files for style are modified but any source files are not.

When running the `run-tests` script, style checkers run only when any source files are modified but they should run when configuration files related to style are modified.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10754 from sarutak/SPARK-12821.

bcc7373f

Jan 13, 2016

[SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous branch when possible · 97e0c7c5

Josh Rosen authored 9 years ago

This patch modifies our PR merge script to reset back to a named branch when restoring the original checkout upon exit. When the committer is originally checked out to a detached head, then they will be restored back to that same ref (the same as today's behavior).

This is a slightly updated version of #7569, with an extra fix to handle the detached head corner-case.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10709 from JoshRosen/SPARK-9383.

97e0c7c5

Jan 12, 2016

[SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1 · 4f60651c

Shixiong Zhu authored 9 years ago

- [x] Upgrade Py4J to 0.9.1
- [x] SPARK-12657: Revert SPARK-12617
- [x] SPARK-12658: Revert SPARK-12511
  - Still keep the change that only reading checkpoint once. This is a manual change and worth to take a look carefully. https://github.com/zsxwing/spark/commit/bfd4b5c040eb29394c3132af3c670b1a7272457c
- [x] Verify no leak any more after reverting our workarounds

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10692 from zsxwing/py4j-0.9.1.

4f60651c

Jan 11, 2016

[SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests · a4499145

Josh Rosen authored 9 years ago

This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.

First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.

I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.

/cc zsxwing

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10704 from JoshRosen/fix-build-test-problems.

a4499145

[SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4

BrianLondon authored 9 years ago

The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.

Author: BrianLondon <brian@seatgeek.com>

Closes #10256 from BrianLondon/master.

8fe928b4

[SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f

Josh Rosen authored 9 years ago

This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.

While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10693 from JoshRosen/netty-hotfix.

f13c7f8f

Jan 10, 2016

[SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b

Josh Rosen authored 9 years ago

Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).

This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.

/cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.

Author: Josh Rosen <rosenville@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #10672 from JoshRosen/enforce-netty-exclusions.

3ab0138b

Jan 09, 2016
- [SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository. · 5b0d5443
  Reynold Xin authored 9 years ago
  
  Author: Reynold Xin <rxin@databricks.com> Closes #10673 from rxin/SPARK-12735.
  5b0d5443
Jan 06, 2016

[SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst · ea489f14

Herman van Hovell authored 9 years ago

This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:

The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling.

The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
- ```CatalystQl```: This implements Query and Expression parsing functionality.
- ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
- ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.

cc rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10583 from hvanhovell/SPARK-12575.

ea489f14

Jan 05, 2016

[SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API · cc4d5229

felixcheung authored 9 years ago

rxin davies shivaram
Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559

- [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10584 from felixcheung/rremovedeprecated.

cc4d5229

Jan 04, 2016

[SPARK-12600][SQL] Remove deprecated methods in Spark SQL · 77ab49b8
Reynold Xin authored 9 years ago
```
Author: Reynold Xin <rxin@databricks.com>

Closes #10559 from rxin/remove-deprecated-sql.
```
77ab49b8

[SPARK-10359][PROJECT-INFRA] Use more random number in... · 9fd7a2f0

Josh Rosen authored 9 years ago

[SPARK-10359][PROJECT-INFRA] Use more random number in dev/test-dependencies.sh; fix version switching

This patch aims to fix another potential source of flakiness in the `dev/test-dependencies.sh` script.

pwendell's original patch and my version used `$(date +%s | tail -c6)` to generate a suffix to use when installing temporary Spark versions into the local Maven cache, but this value only changes once per second and thus is highly collision-prone when concurrent builds launch on AMPLab Jenkins. In order to reduce the potential for conflicts, this patch updates the script to call Python's random number generator instead.

I also fixed a bug in how we captured the original project version; the bug was causing the exit handler code to fail.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10558 from JoshRosen/build-dep-tests-round-3.

9fd7a2f0