Skip to content
Snippets Groups Projects
  1. Mar 27, 2017
    • Josh Rosen's avatar
      [SPARK-20102] Fix nightly packaging and RC packaging scripts w/ two minor build fixes · 4056191d
      Josh Rosen authored
      
      ## What changes were proposed in this pull request?
      
      The master snapshot publisher builds are currently broken due to two minor build issues:
      
      1. For unknown reasons, the LFTP `mkdir -p` command began throwing errors when the remote directory already exists. This change of behavior might have been caused by configuration changes in the ASF's SFTP server, but I'm not entirely sure of that. To work around this problem, this patch updates the script to ignore errors from the `lftp mkdir -p` commands.
      2. The PySpark `setup.py` file references a non-existent `pyspark.ml.stat` module, causing Python packaging to fail by complaining about a missing directory. The fix is to simply drop that line from the setup script.
      
      ## How was this patch tested?
      
      The LFTP fix was tested by manually running the failing commands on AMPLab Jenkins against the ASF SFTP server. The PySpark fix was tested locally.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #17437 from JoshRosen/spark-20102.
      
      (cherry picked from commit 314cf51d)
      Signed-off-by: default avatarJosh Rosen <joshrosen@databricks.com>
      4056191d
  2. Dec 09, 2016
  3. Dec 08, 2016
    • Shivaram Venkataraman's avatar
      [SPARK-18590][SPARKR] Change the R source build to Hadoop 2.6 · e43209fe
      Shivaram Venkataraman authored
      This PR changes the SparkR source release tarball to be built using the Hadoop 2.6 profile. Previously it was using the without hadoop profile which leads to an error as discussed in https://github.com/apache/spark/pull/16014#issuecomment-265843991
      
      
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #16218 from shivaram/fix-sparkr-release-build.
      
      (cherry picked from commit 202fcd21)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      e43209fe
    • Felix Cheung's avatar
      [SPARK-18590][SPARKR] build R source package when making distribution · d69df907
      Felix Cheung authored
      This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not)
      
      But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below.
      
      This PR also includes a few minor fixes.
      
      These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md
      
      ) on what's going to a CRAN release, which is now run during make-distribution.sh.
      1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path
      2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation)
      3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN
       (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests)
      4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1)
       (the output of this step is what we package into Spark dist and sparkr.zip)
      
      Alternatively,
         R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead.
       But in any case, despite installing the package multiple times this is relatively fast.
      Building vignettes takes a while though.
      
      Manually, CI.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16014 from felixcheung/rdist.
      
      (cherry picked from commit c3d3a9d0)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      d69df907
  4. Dec 01, 2016
    • Reynold Xin's avatar
      [SPARK-18639] Build only a single pip package · 2d2e8018
      Reynold Xin authored
      
      ## What changes were proposed in this pull request?
      We current build 5 separate pip binary tar balls, doubling the release script runtime. It'd be better to build one, especially for use cases that are just using Spark locally. In the long run, it would make more sense to have Hadoop support be pluggable.
      
      ## How was this patch tested?
      N/A - this is a release build script that doesn't have any automated test coverage. We will know if it goes wrong when we prepare releases.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #16072 from rxin/SPARK-18639.
      
      (cherry picked from commit 37e52f87)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      2d2e8018
  5. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · 6a3cbbc0
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      6a3cbbc0
  6. Nov 12, 2016
  7. Aug 26, 2016
    • Michael Gummelt's avatar
      [SPARK-16967] move mesos to module · 8e5475be
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Move Mesos code into a mvn module
      
      ## How was this patch tested?
      
      unit tests
      manually submitting a client mode and cluster mode job
      spark/mesos integration test suite
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14637 from mgummelt/mesos-module.
      8e5475be
  8. Jul 08, 2016
  9. Mar 07, 2016
    • Sean Owen's avatar
      [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
      
      I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
      
      ## How was this patch tested?
      
      `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11522 from srowen/SPARK-13596.
      0eea12a3
  10. Feb 26, 2016
    • Josh Rosen's avatar
      [SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifacts to home.apache.org · f77dc4e1
      Josh Rosen authored
      Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.
      f77dc4e1
  11. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  12. Jan 15, 2016
  13. Dec 22, 2015
  14. Oct 13, 2015
  15. Sep 16, 2015
  16. Aug 20, 2015
    • Josh Rosen's avatar
      [SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke... · 12de3483
      Josh Rosen authored
      [SPARK-10126] [PROJECT INFRA] Fix typo in release-build.sh which broke snapshot publishing for Scala 2.11
      
      The current `release-build.sh` has a typo which breaks snapshot publication for Scala 2.11. We should change the Scala version to 2.11 and clean before building a 2.11 snapshot.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8325 from JoshRosen/fix-2.11-snapshots.
      12de3483
  17. Aug 11, 2015
    • Patrick Wendell's avatar
      [SPARK-1517] Refactor release scripts to facilitate nightly publishing · 3ef0f329
      Patrick Wendell authored
      This update contains some code changes to the release scripts that allow easier nightly publishing. I've been using these new scripts on Jenkins for cutting and publishing nightly snapshots for the last month or so, and it has been going well. I'd like to get them merged back upstream so this can be maintained by the community.
      
      The main changes are:
      1. Separates the release tagging from various build possibilities for an already tagged release (`release-tag.sh` and `release-build.sh`).
      2. Allow for injecting credentials through the environment, including GPG keys. This is then paired with secure key injection in Jenkins.
      3. Support for copying build results to a remote directory, and also "rotating" results, e.g. the ability to keep the last N copies of binary or doc builds.
      
      I'm happy if anyone wants to take a look at this - it's not user facing but an internal utility used for generating releases.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #7411 from pwendell/release-script-updates and squashes the following commits:
      
      74f9beb [Patrick Wendell] Moving maven build command to a variable
      233ce85 [Patrick Wendell] [SPARK-1517] Refactor release scripts to facilitate nightly publishing
      3ef0f329
Loading