Skip to content
Snippets Groups Projects
  1. Mar 10, 2016
  2. Mar 09, 2016
    • Sean Owen's avatar
      [SPARK-13595][BUILD] Move docker, extras modules into external · 256704c7
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`
      
      ## How was this patch tested?
      
      This is tested with Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11523 from srowen/SPARK-13595.
      256704c7
  3. Mar 08, 2016
    • Sean Owen's avatar
      [SPARK-13715][MLLIB] Remove last usages of jblas in tests · 54040f8d
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Remove last usage of jblas, in tests
      
      ## How was this patch tested?
      
      Jenkins tests -- the same ones that are being modified.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11560 from srowen/SPARK-13715.
      54040f8d
    • Dongjoon Hyun's avatar
      [HOT-FIX][BUILD] Use the new location of `checkstyle-suppressions.xml` · 7771c731
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes `dev/lint-java` and `mvn checkstyle:check` failures due the recent file location change.
      The following is the error message of current master.
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Failed during checkstyle configuration: cannot initialize module SuppressionFilter - Cannot set property 'file' to 'checkstyle-suppressions.xml' in module SuppressionFilter: InvocationTargetException: Unable to find: checkstyle-suppressions.xml -> [Help 1]
      ```
      
      ## How was this patch tested?
      
      Manual. The following command should run correctly.
      ```
      ./dev/lint-java
      mvn checkstyle:check
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11567 from dongjoon-hyun/hotfix_checkstyle_suppression.
      7771c731
  4. Mar 07, 2016
    • Sean Owen's avatar
      [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
      
      I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
      
      ## How was this patch tested?
      
      `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11522 from srowen/SPARK-13596.
      0eea12a3
  5. Mar 03, 2016
    • Steve Loughran's avatar
      [SPARK-13599][BUILD] remove transitive groovy dependencies from Hive · 9a48c656
      Steve Loughran authored
      ## What changes were proposed in this pull request?
      
      Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.
      
      This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.
      
      ## How was this patch tested?
      
      1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
      1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
      1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
      1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
      1. Patch applied
      1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
      1. Examined created spark-assembly, verified no org.codehaus packages
      1. Verified that the maven dependency tree no longer references groovy
      
      Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
      9a48c656
  6. Mar 01, 2016
    • Reynold Xin's avatar
      [SPARK-13548][BUILD] Move tags and unsafe modules into common · b0ee7d43
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch moves tags and unsafe modules into common directory to remove 2 top level non-user-facing directories.
      
      ## How was this patch tested?
      Jenkins should suffice.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11426 from rxin/SPARK-13548.
      b0ee7d43
  7. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  8. Feb 27, 2016
  9. Feb 17, 2016
  10. Feb 09, 2016
  11. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  12. Jan 23, 2016
  13. Jan 22, 2016
    • Shixiong Zhu's avatar
      [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming · bc1babd6
      Shixiong Zhu authored
      - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
      - Remove HttpFileServer
      - Remove Akka configs from SparkConf and SSLOptions
      - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
      - Update comments and docs
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10854 from zsxwing/remove-akka.
      bc1babd6
  14. Jan 20, 2016
    • Shixiong Zhu's avatar
      [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60
      Shixiong Zhu authored
      Include the following changes:
      
      1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
      2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
      3. Update the ActorWordCount example and add the JavaActorWordCount example
      4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10744 from zsxwing/streaming-akka-2.
      b7d74a60
  15. Jan 15, 2016
  16. Jan 11, 2016
    • BrianLondon's avatar
      [SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4
      BrianLondon authored
      The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.
      
      Author: BrianLondon <brian@seatgeek.com>
      
      Closes #10256 from BrianLondon/master.
      8fe928b4
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f
      Josh Rosen authored
      This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.
      
      While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10693 from JoshRosen/netty-hotfix.
      f13c7f8f
  17. Jan 10, 2016
    • Josh Rosen's avatar
      [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b
      Josh Rosen authored
      Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).
      
      This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.
      
      /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.
      
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10672 from JoshRosen/enforce-netty-exclusions.
      3ab0138b
  18. Jan 08, 2016
    • Josh Rosen's avatar
      [SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build · 090d6913
      Josh Rosen authored
      This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository.
      
      I tested this by running
      
      ```
      build/mvn \
              -Phive \
              -Phive-thriftserver \
              -Pkinesis-asl \
              -Pspark-ganglia-lgpl \
              -Pyarn \
              dependency:go-offline
      ```
      
      inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10659 from JoshRosen/SPARK-4628.
      090d6913
    • Sean Owen's avatar
      [SPARK-4819] Remove Guava's "Optional" from public API · 659fd9d0
      Sean Owen authored
      Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)
      
      See also https://github.com/apache/spark/pull/10512
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10513 from srowen/SPARK-4819.
      659fd9d0
  19. Jan 06, 2016
    • Herman van Hovell's avatar
      [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst · ea489f14
      Herman van Hovell authored
      This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:
      
      The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling.
      
      The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
      - ```CatalystQl```: This implements Query and Expression parsing functionality.
      - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
      - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.
      
      cc rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10583 from hvanhovell/SPARK-12575.
      ea489f14
  20. Jan 05, 2016
  21. Jan 02, 2016
    • Herman van Hovell's avatar
      [SPARK-12362][SQL][WIP] Inline Hive Parser · 970635a9
      Herman van Hovell authored
      This PR inlines the Hive SQL parser in Spark SQL.
      
      The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase.
      
      This PR is a WIP and should not be merged until we have sorted out the build issues.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      Author: Nong Li <nong@databricks.com>
      Author: Nong Li <nongli@gmail.com>
      
      Closes #10525 from hvanhovell/SPARK-12362.
      970635a9
  22. Dec 30, 2015
    • Josh Rosen's avatar
      [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests · 27a42c71
      Josh Rosen authored
      This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath.
      
      This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs.
      
      This patch is based on pwendell's work in #8531.
      
      Closes #8531.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #10461 from JoshRosen/SPARK-10359.
      27a42c71
    • Reynold Xin's avatar
      Revert "[SPARK-12362][SQL][WIP] Inline Hive Parser" · 27af6157
      Reynold Xin authored
      This reverts commit b600bccf due to non-deterministic build breaks.
      27af6157
  23. Dec 29, 2015
    • Nong Li's avatar
      [SPARK-12362][SQL][WIP] Inline Hive Parser · b600bccf
      Nong Li authored
      This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark.
      
      I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do:
      - [ ] Get it to pass jenkins build/test.
      - [ ] Aknowledge Hive-project for using their parser.
      - [ ] Refactorings between HiveQl and the java classes.
        - [ ] Create our own ASTNode and integrate the current implicit extentions.
        - [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```.
      - [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files.
        - [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```.
        - [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration.
        - [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration.
      - [ ] Moving the parser into sql/core.
      
      cc nongli rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      Author: Nong Li <nong@databricks.com>
      Author: Nong Li <nongli@gmail.com>
      
      Closes #10509 from hvanhovell/SPARK-12362.
      b600bccf
  24. Dec 22, 2015
  25. Dec 20, 2015
  26. Dec 19, 2015
  27. Dec 09, 2015
  28. Dec 08, 2015
  29. Dec 04, 2015
    • Josh Rosen's avatar
      [SPARK-12112][BUILD] Upgrade to SBT 0.13.9 · b7204e1d
      Josh Rosen authored
      We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin).
      
      I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
      b7204e1d
    • Dmitry Erastov's avatar
      [SPARK-6990][BUILD] Add Java linting script; fix minor warnings · d0d82227
      Dmitry Erastov authored
      This replaces https://github.com/apache/spark/pull/9696
      
      Invoke Checkstyle and print any errors to the console, failing the step.
      Use Google's style rules modified according to
      https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
      Some important checks are disabled (see TODOs in `checkstyle.xml`) due to
      multiple violations being present in the codebase.
      
      Suggest fixing those TODOs in a separate PR(s).
      
      More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/).
      
      Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles):
      
      > Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
      > [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1
      
      Also fix some of the minor violations that didn't require sweeping changes.
      
      Apologies for the previous botched PRs - I finally figured out the issue.
      
      cr: JoshRosen, pwendell
      
      > I state that the contribution is my original work, and I license the work to the project under the project's open source license.
      
      Author: Dmitry Erastov <derastov@gmail.com>
      
      Closes #9867 from dskrvk/master.
      d0d82227
  30. Nov 23, 2015
    • Josh Rosen's avatar
      [SPARK-4424] Remove spark.driver.allowMultipleContexts override in tests · 1b6e938b
      Josh Rosen authored
      This patch removes `spark.driver.allowMultipleContexts=true` from our test configuration. The multiple SparkContexts check was originally disabled because certain tests suites in SQL needed to create multiple contexts. As far as I know, this configuration change is no longer necessary, so we should remove it in order to make it easier to find test cleanup bugs.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9865 from JoshRosen/SPARK-4424.
      1b6e938b
  31. Nov 18, 2015
  32. Nov 17, 2015
  33. Nov 16, 2015
  34. Nov 11, 2015
  35. Nov 10, 2015
    • Josh Rosen's avatar
      [SPARK-9818] Re-enable Docker tests for JDBC data source · 1dde39d7
      Josh Rosen authored
      This patch re-enables tests for the Docker JDBC data source. These tests were reverted in #4872 due to transitive dependency conflicts introduced by the `docker-client` library. This patch should avoid those problems by using a version of `docker-client` which shades its transitive dependencies and by performing some build-magic to work around problems with that shaded JAR.
      
      In addition, I significantly refactored the tests to simplify the setup and teardown code and to fix several Docker networking issues which caused problems when running in `boot2docker`.
      
      Closes #8101.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #9503 from JoshRosen/docker-jdbc-tests.
      1dde39d7
Loading