Skip to content
Snippets Groups Projects
  1. Mar 02, 2016
    • Wojciech Jurczyk's avatar
      Fix run-tests.py typos · 75e618de
      Wojciech Jurczyk authored
      ## What changes were proposed in this pull request?
      
      The PR fixes typos in an error message in dev/run-tests.py.
      
      Author: Wojciech Jurczyk <wojciech.jurczyk@codilime.com>
      
      Closes #11467 from wjur/wjur/typos_run_tests.
      75e618de
  2. Mar 01, 2016
    • jerryshao's avatar
      [BUILD][MINOR] Fix SBT build error with network-yarn module · b4d096de
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      ```
      error] Expected ID character
      [error] Not a valid command: common (similar: completions)
      [error] Expected project ID
      [error] Expected configuration
      [error] Expected ':' (if selecting a configuration)
      [error] Expected key
      [error] Not a valid key: common (similar: commands)
      [error] common/network-yarn/test
      ```
      
      `common/network-yarn` is not a valid sbt project, we should change to `network-yarn`.
      
      ## How was this patch tested?
      
      Locally run the the unit-test.
      
      CC rxin , we should either change here, or change the sbt project name.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #11456 from jerryshao/build-fix.
      b4d096de
  3. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  4. Feb 27, 2016
  5. Feb 26, 2016
    • Josh Rosen's avatar
      [SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifacts to home.apache.org · f77dc4e1
      Josh Rosen authored
      Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.
      f77dc4e1
  6. Feb 17, 2016
  7. Feb 12, 2016
    • Holden Karau's avatar
      [SPARK-13154][PYTHON] Add linting for pydocs · 64515e5f
      Holden Karau authored
      We should have lint rules using sphinx to automatically catch the pydoc issues that are sometimes introduced.
      
      Right now ./dev/lint-python will skip building the docs if sphinx isn't present - but it might make sense to fail hard - just a matter of if we want to insist all PySpark developers have sphinx present.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #11109 from holdenk/SPARK-13154-add-pydoc-lint-for-docs.
      64515e5f
  8. Feb 09, 2016
  9. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  10. Jan 27, 2016
    • Josh Rosen's avatar
      [SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_to_test() · 41f0c85f
      Josh Rosen authored
      There's a minor bug in how we handle the `root` module in the `modules_to_test()` function in `dev/run-tests.py`: since `root` now depends on `build` (since every test needs to run on any build test), we now need to check for the presence of root in `modules_to_test` instead of `changed_modules`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10933 from JoshRosen/build-module-fix.
      41f0c85f
  11. Jan 26, 2016
    • Josh Rosen's avatar
      [SPARK-8725][PROJECT-INFRA] Test modules in topologically-sorted order in dev/run-tests · ee74498d
      Josh Rosen authored
      This patch improves our `dev/run-tests` script to test modules in a topologically-sorted order based on modules' dependencies.  This will help to ensure that bugs in upstream projects are not misattributed to downstream projects because those projects' tests were the first ones to exhibit the failure
      
      Topological sorting is also useful for shortening the feedback loop when testing pull requests: if I make a change in SQL then the SQL tests should run before MLlib, not after.
      
      In addition, this patch also updates our test module definitions to split `sql` into `catalyst`, `sql`, and `hive` in order to allow more tests to be skipped when changing only `hive/` files.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10885 from JoshRosen/SPARK-8725.
      ee74498d
  12. Jan 24, 2016
    • Holden Karau's avatar
      [SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python tools · a8340013
      Holden Karau authored
      Minor since so few people use them, but it would probably be good to have a requirements file for our python release tools for easier setup (also version pinning).
      
      cc JoshRosen who looked at the original JIRA.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #10871 from holdenk/SPARK-10498-add-requirements-file-for-dev-python-tools.
      a8340013
  13. Jan 23, 2016
  14. Jan 22, 2016
    • Shixiong Zhu's avatar
      [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming · bc1babd6
      Shixiong Zhu authored
      - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
      - Remove HttpFileServer
      - Remove Akka configs from SparkConf and SSLOptions
      - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
      - Update comments and docs
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10854 from zsxwing/remove-akka.
      bc1babd6
  15. Jan 20, 2016
    • Shixiong Zhu's avatar
      [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60
      Shixiong Zhu authored
      Include the following changes:
      
      1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
      2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
      3. Update the ActorWordCount example and add the JavaActorWordCount example
      4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10744 from zsxwing/streaming-akka-2.
      b7d74a60
  16. Jan 18, 2016
  17. Jan 15, 2016
    • Josh Rosen's avatar
      [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile · 8dbbf3e7
      Josh Rosen authored
      This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.
      
      /cc rxin srowen
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
      8dbbf3e7
    • Reynold Xin's avatar
      [SPARK-12667] Remove block manager's internal "external block store" API · ad1503f9
      Reynold Xin authored
      This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems.
      
      There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10752 from rxin/remove-offheap.
      ad1503f9
    • Hossein's avatar
      [SPARK-12833][SQL] Initial import of spark-csv · 5f83c699
      Hossein authored
      CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Having to rely on a 3rd party component for this leads to poor user experience for new users. This PR merges the popular spark-csv data source package (https://github.com/databricks/spark-csv) with SparkSQL.
      
      This is a first PR to bring the functionality to spark 2.0 master. We will complete items outlines in the design document (see JIRA attachment) in follow up pull requests.
      
      Author: Hossein <hossein@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10766 from rxin/csv.
      5f83c699
  18. Jan 14, 2016
    • Reynold Xin's avatar
      [SPARK-12829] Turn Java style checker on · 591c88c9
      Reynold Xin authored
      It was previously turned off because there was a problem with a pull request. We should turn it on now.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10763 from rxin/SPARK-12829.
      591c88c9
    • Kousuke Saruta's avatar
      [SPARK-12821][BUILD] Style checker should run when some configuration files... · bcc7373f
      Kousuke Saruta authored
      [SPARK-12821][BUILD] Style checker should run when some configuration files for style are modified but any source files are not.
      
      When running the `run-tests` script, style checkers run only when any source files are modified but they should run when configuration files related to style are modified.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10754 from sarutak/SPARK-12821.
      bcc7373f
  19. Jan 13, 2016
    • Josh Rosen's avatar
      [SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous branch when possible · 97e0c7c5
      Josh Rosen authored
      This patch modifies our PR merge script to reset back to a named branch when restoring the original checkout upon exit. When the committer is originally checked out to a detached head, then they will be restored back to that same ref (the same as today's behavior).
      
      This is a slightly updated version of #7569, with an extra fix to handle the detached head corner-case.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10709 from JoshRosen/SPARK-9383.
      97e0c7c5
  20. Jan 12, 2016
  21. Jan 11, 2016
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests · a4499145
      Josh Rosen authored
      This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.
      
      First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.
      
      I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.
      
      /cc zsxwing
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10704 from JoshRosen/fix-build-test-problems.
      a4499145
    • BrianLondon's avatar
      [SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4
      BrianLondon authored
      The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.
      
      Author: BrianLondon <brian@seatgeek.com>
      
      Closes #10256 from BrianLondon/master.
      8fe928b4
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f
      Josh Rosen authored
      This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.
      
      While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10693 from JoshRosen/netty-hotfix.
      f13c7f8f
  22. Jan 10, 2016
    • Josh Rosen's avatar
      [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b
      Josh Rosen authored
      Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).
      
      This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.
      
      /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.
      
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10672 from JoshRosen/enforce-netty-exclusions.
      3ab0138b
  23. Jan 09, 2016
  24. Jan 06, 2016
    • Herman van Hovell's avatar
      [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst · ea489f14
      Herman van Hovell authored
      This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:
      
      The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling.
      
      The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
      - ```CatalystQl```: This implements Query and Expression parsing functionality.
      - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
      - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.
      
      cc rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10583 from hvanhovell/SPARK-12575.
      ea489f14
  25. Jan 05, 2016
    • felixcheung's avatar
      [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API · cc4d5229
      felixcheung authored
      rxin davies shivaram
      Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559
      
      - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10584 from felixcheung/rremovedeprecated.
      cc4d5229
  26. Jan 04, 2016
    • Reynold Xin's avatar
      [SPARK-12600][SQL] Remove deprecated methods in Spark SQL · 77ab49b8
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10559 from rxin/remove-deprecated-sql.
      77ab49b8
    • Josh Rosen's avatar
      [SPARK-10359][PROJECT-INFRA] Use more random number in... · 9fd7a2f0
      Josh Rosen authored
      [SPARK-10359][PROJECT-INFRA] Use more random number in dev/test-dependencies.sh; fix version switching
      
      This patch aims to fix another potential source of flakiness in the `dev/test-dependencies.sh` script.
      
      pwendell's original patch and my version used `$(date +%s | tail -c6)` to generate a suffix to use when installing temporary Spark versions into the local Maven cache, but this value only changes once per second and thus is highly collision-prone when concurrent builds launch on AMPLab Jenkins. In order to reduce the potential for conflicts, this patch updates the script to call Python's random number generator instead.
      
      I also fixed a bug in how we captured the original project version; the bug was causing the exit handler code to fail.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10558 from JoshRosen/build-dep-tests-round-3.
      9fd7a2f0
    • Josh Rosen's avatar
      [SPARK-12612][PROJECT-INFRA] Add missing Hadoop profiles to dev/run-tests-*.py scripts and dev/deps · 0d165ec2
      Josh Rosen authored
      There are a couple of places in the `dev/run-tests-*.py` scripts which deal with Hadoop profiles, but the set of profiles that they handle does not include all Hadoop profiles defined in our POM. Similarly, the `hadoop-2.2` and `hadoop-2.6` profiles were missing from `dev/deps`.
      
      This patch updates these scripts to include all four Hadoop profiles defined in our POM.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10565 from JoshRosen/add-missing-hadoop-profiles-in-test-scripts.
      0d165ec2
  27. Jan 01, 2016
  28. Dec 31, 2015
  29. Dec 30, 2015
    • Josh Rosen's avatar
      [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests · 27a42c71
      Josh Rosen authored
      This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath.
      
      This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs.
      
      This patch is based on pwendell's work in #8531.
      
      Closes #8531.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #10461 from JoshRosen/SPARK-10359.
      27a42c71
  30. Dec 28, 2015
    • Josh Rosen's avatar
      [SPARK-12508][PROJECT-INFRA] Fix minor bugs in dev/tests/pr_public_classes.sh script · ab6bedd8
      Josh Rosen authored
      This patch fixes a handful of minor bugs in the `dev/tests/pr_public_classes.sh` script, which is used by the `run_tests_jenkins` script to detect the addition of new public classes:
      
      - Account for differences between BSD and GNU `sed` in order to allow the script to run on OS X.
      - Diff `$ghprbActualCommit^...$ghprbActualCommit ` instead of `master...$ghprbActualCommit`: since `ghprbActualCommit` is a merge commit which results from merging the PR into the target branch, this will give us the desired diff and will avoid certain race-conditions which could lead to false-positives.
      - Use `echo -e` instead of `echo` so that newline characters are handled correctly in output. This should fix a formatting glitch which caused the output to appear on a single line in the GitHub comment (see [the SC2028 page](https://github.com/koalaman/shellcheck/wiki/SC2028) on the Shellcheck wiki for more details).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10455 from JoshRosen/fix-pr-public-classes-test.
      ab6bedd8
  31. Dec 24, 2015
  32. Dec 22, 2015
  33. Dec 20, 2015
Loading