Skip to content
Snippets Groups Projects
  1. Apr 02, 2017
  2. Feb 18, 2017
  3. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  4. Feb 08, 2017
    • Sean Owen's avatar
      [SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier · e8d3fca4
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove support for Hadoop 2.5 and earlier
      - Remove reflection and code constructs only needed to support multiple versions at once
      - Update docs to reflect newer versions
      - Remove older versions' builds and profiles.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16810 from srowen/SPARK-19464.
      Unverified
      e8d3fca4
  5. Jan 25, 2017
    • Holden Karau's avatar
      [SPARK-19064][PYSPARK] Fix pip installing of sub components · 965c82d8
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Fix instalation of mllib and ml sub components, and more eagerly cleanup cache files during test script & make-distribution.
      
      ## How was this patch tested?
      
      Updated sanity test script to import mllib and ml sub-components.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #16465 from holdenk/SPARK-19064-fix-pip-install-sub-components.
      965c82d8
  6. Jan 16, 2017
    • Felix Cheung's avatar
      [SPARK-18828][SPARKR] Refactor scripts for R · c84f7d3e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Refactored script to remove duplications and clearer purpose for each script
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16249 from felixcheung/rscripts.
      c84f7d3e
  7. Dec 21, 2016
  8. Dec 15, 2016
  9. Dec 08, 2016
    • Shivaram Venkataraman's avatar
      [SPARKR][PYSPARK] Fix R source package name to match Spark version. Remove pip... · 4ac8b20b
      Shivaram Venkataraman authored
      [SPARKR][PYSPARK] Fix R source package name to match Spark version. Remove pip tar.gz from distribution
      
      ## What changes were proposed in this pull request?
      
      Fixes name of R source package so that the `cp` in release-build.sh works correctly.
      
      Issue discussed in https://github.com/apache/spark/pull/16014#issuecomment-265867125
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #16221 from shivaram/fix-sparkr-release-build-name.
      4ac8b20b
    • Felix Cheung's avatar
      [SPARK-18590][SPARKR] build R source package when making distribution · c3d3a9d0
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not)
      
      But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below.
      
      This PR also includes a few minor fixes.
      
      ### more details
      
      These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's going to a CRAN release, which is now run during make-distribution.sh.
      1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path
      2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation)
      3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN
       (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests)
      4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1)
       (the output of this step is what we package into Spark dist and sparkr.zip)
      
      Alternatively,
         R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead.
       But in any case, despite installing the package multiple times this is relatively fast.
      Building vignettes takes a while though.
      
      ## How was this patch tested?
      
      Manually, CI.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16014 from felixcheung/rdist.
      c3d3a9d0
  10. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  11. Jun 14, 2016
    • Adam Roberts's avatar
      [SPARK-15821][DOCS] Include parallel build info · a431e3f1
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      We should mention that users can build Spark using multiple threads to decrease build times; either here or in "Building Spark"
      
      ## How was this patch tested?
      
      Built on machines with between one core to 192 cores using mvn -T 1C and observed faster build times with no loss in stability
      
      In response to the question here https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest this option as we know it works for Spark and can result in faster builds
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13562 from a-roberts/patch-3.
      a431e3f1
  12. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  13. Mar 15, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13576][BUILD] Don't create assembly for examples. · 48978abf
      Marcelo Vanzin authored
      As part of the goal to stop creating assemblies in Spark, this change
      modifies the mvn and sbt builds to not create an assembly for examples.
      
      Instead, dependencies are copied to the build directory (under
      target/scala-xx/jars), and in the final archive, into the "examples/jars"
      directory.
      
      To avoid having to deal too much with Windows batch files, I made examples
      run through the launcher library; the spark-submit launcher now has a
      special mode to run examples, which adds all the necessary jars to the
      spark-submit command line, and replaces the bash and batch scripts that
      were used to run examples. The scripts are now just a thin wrapper around
      spark-submit; another advantage is that now all spark-submit options are
      supported.
      
      There are a few glitches; in the mvn build, a lot of duplicated dependencies
      get copied, because they are promoted to "compile" scope due to extra
      dependencies in the examples module (such as HBase). In the sbt build,
      all dependencies are copied, because there doesn't seem to be an easy
      way to filter things.
      
      I plan to clean some of this up when the rest of the tasks are finished.
      When the main assembly is replaced with jars, we can remove duplicate jars
      from the examples directory during packaging.
      
      Tested by running SparkPi in: maven build, sbt build, dist created by
      make-distribution.sh.
      
      Finally: note that running the "assembly" target in sbt doesn't build
      the examples anymore. You need to run "package" for that.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11452 from vanzin/SPARK-13576.
      48978abf
  14. Mar 07, 2016
    • Sean Owen's avatar
      [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
      
      I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
      
      ## How was this patch tested?
      
      `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11522 from srowen/SPARK-13596.
      0eea12a3
  15. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  16. Feb 27, 2016
    • Reynold Xin's avatar
      [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts · 59e3e10b
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts.
      
      Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11400 from rxin/release-script.
      59e3e10b
  17. Jan 09, 2016
  18. Dec 23, 2015
  19. Dec 22, 2015
  20. Dec 01, 2015
  21. Nov 24, 2015
  22. Nov 15, 2015
    • Sun Rui's avatar
      [SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritable · 835a79d7
      Sun Rui authored
      The basic idea is that:
      The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable.
      
      When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip.
      
      sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes.
      
      The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR.
      
      Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages)  so that these package can be accessed in R.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #9390 from sun-rui/SPARK-10500.
      835a79d7
  23. Nov 02, 2015
    • Calvin Jia's avatar
      [SPARK-11236] [TEST-MAVEN] [TEST-HADOOP1.0] [CORE] Update Tachyon dependency 0.7.1 -> 0.8.1 · 476f4348
      Calvin Jia authored
      This is a reopening of #9204 which failed hadoop1 sbt tests.
      
      With the original PR, a classpath issue would occur due to the MIMA plugin pulling in hadoop-2.2 dependencies regardless of the hadoop version when building the `oldDeps` project. These affect the hadoop1 sbt build because they are placed in `lib_managed` and Tachyon 0.8.0's default hadoop version is 2.2.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #9395 from calvinjia/spark-11236.
      476f4348
  24. Oct 30, 2015
  25. Oct 29, 2015
    • Calvin Jia's avatar
      [SPARK-11236][CORE] Update Tachyon dependency from 0.7.1 -> 0.8.0. · 4f5e60c6
      Calvin Jia authored
      Upgrades the tachyon-client version to the latest release.
      
      No new dependencies are added and no spark facing APIs are changed. The removal of the `tachyon-underfs-s3` exclusion will enable users to use S3 out of the box and there are no longer any additional external dependencies added by the module.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #9204 from calvinjia/spark-11236.
      4f5e60c6
  26. Oct 13, 2015
  27. Sep 28, 2015
    • Sean Owen's avatar
      [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE · bf4199e2
      Sean Owen authored
      In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree.
      
      The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that.
      
      Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way.
      
      The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #8919 from srowen/SPARK-10833.
      bf4199e2
  28. Aug 17, 2015
    • Calvin Jia's avatar
      [SPARK-9199] [CORE] Upgrade Tachyon version from 0.7.0 -> 0.7.1. · 3ff81ad2
      Calvin Jia authored
      Updates the tachyon-client version to the latest release.
      
      The main difference between 0.7.0 and 0.7.1 on the client side is to support running Tachyon on local file system by default.
      
      No new non-Tachyon dependencies are added, and no code changes are required since the client API has not changed.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #8235 from calvinjia/spark-9199-master.
      3ff81ad2
  29. Aug 12, 2015
  30. Jul 30, 2015
    • Calvin Jia's avatar
      [SPARK-9199] [CORE] Update Tachyon dependency from 0.6.4 -> 0.7.0 · 04c84091
      Calvin Jia authored
      No new dependencies are added. The exclusion changes are due to the change in tachyon-client 0.7.0's project structure.
      
      There is no client side API change in Tachyon 0.7.0 so no code changes are required.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #7577 from calvinjia/SPARK-9199 and squashes the following commits:
      
      4e81e40 [Calvin Jia] Update Tachyon dependency from 0.6.4 -> 0.7.0
      04c84091
  31. Jul 13, 2015
    • Sun Rui's avatar
      [SPARK-6797] [SPARKR] Add support for YARN cluster mode. · 7f487c8b
      Sun Rui authored
      This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required that the SparkR package be installed on each worker node.
      
      This PR uses the JDK jar tool to package the SparkR package, because jar is thought to be available on both Linux/Windows platforms where JDK has been installed.
      
      This PR does not address the R worker involved in RDD API. Will address it in a separate JIRA issue.
      
      This PR does not address SBT build. SparkR installation and packaging by SBT will be addressed in a separate JIRA issue.
      
      R/install-dev.bat is not tested. shivaram , Could you help to test it?
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6743 from sun-rui/SPARK-6797 and squashes the following commits:
      
      ca63c86 [Sun Rui] Adjust MimaExcludes after rebase.
      7313374 [Sun Rui] Fix unit test errors.
      72695fb [Sun Rui] Fix unit test failures.
      193882f [Sun Rui] Fix Mima test error.
      fe25a33 [Sun Rui] Fix Mima test error.
      35ecfa3 [Sun Rui] Fix comments.
      c38a005 [Sun Rui] Unzipped SparkR binary package is still required for standalone and Mesos modes.
      b05340c [Sun Rui] Fix scala style.
      2ca5048 [Sun Rui] Fix comments.
      1acefd1 [Sun Rui] Fix scala style.
      0aa1e97 [Sun Rui] Fix scala style.
      41d4f17 [Sun Rui] Add support for locating SparkR package for R workers required by RDD APIs.
      49ff948 [Sun Rui] Invoke jar.exe with full path in install-dev.bat.
      7b916c5 [Sun Rui] Use 'rem' consistently.
      3bed438 [Sun Rui] Add a comment.
      681afb0 [Sun Rui] Fix a bug that RRunner does not handle client deployment modes.
      cedfbe2 [Sun Rui] [SPARK-6797][SPARKR] Add support for YARN cluster mode.
      7f487c8b
  32. Jun 07, 2015
    • Sean Owen's avatar
      [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+ · e84815dc
      Sean Owen authored
      Update build to use Java 7, and remove some comments and special-case support for Java 6.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6265 from srowen/SPARK-7733 and squashes the following commits:
      
      59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
      e84815dc
  33. May 23, 2015
    • Shivaram Venkataraman's avatar
      [HOTFIX] Copy SparkR lib if it exists in make-distribution · b231baa2
      Shivaram Venkataraman authored
      This is to fix an issue reported in #6373 where the `cp` would fail if `-Psparkr` was not used in the build
      
      cc dragos pwendell
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6379 from shivaram/make-distribution-hotfix and squashes the following commits:
      
      08eb7e4 [Shivaram Venkataraman] Copy SparkR lib if it exists in make-distribution
      b231baa2
    • Shivaram Venkataraman's avatar
      [SPARK-6811] Copy SparkR lib in make-distribution.sh · a40bca01
      Shivaram Venkataraman authored
      This change also remove native libraries from SparkR to make sure our distribution works across platforms
      
      Tested by building on Mac, running on Amazon Linux (CentOS), Windows VM and vice-versa (built on Linux run on Mac)
      
      I will also test this with YARN soon and update this PR.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6373 from shivaram/sparkr-binary and squashes the following commits:
      
      ae41b5c [Shivaram Venkataraman] Remove native libraries from SparkR Also include the built SparkR package in make-distribution.sh
      a40bca01
  34. May 14, 2015
    • FavioVazquez's avatar
      [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions · 7fb715de
      FavioVazquez authored
      Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.
      
      Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.
      
      Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      
      Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:
      
      11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
      379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
      3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
      31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
      cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
      83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
      93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
      668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
      fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
      0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
      a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
      199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
      88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
      70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
      287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
      1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
      6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
      7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
      7fb715de
  35. May 03, 2015
    • Sean Owen's avatar
      [SPARK-7302] [DOCS] SPARK building documentation still mentions building for yarn 0.23 · 9e25b09f
      Sean Owen authored
      Remove references to Hadoop 0.23
      
      CC tgravescs Is this what you had in mind? basically all refs to 0.23?
      We don't support YARN 0.23, but also don't support Hadoop 0.23 anymore AFAICT. There are no builds or releases for it.
      
      In fact, on a related note, refs to CDH3 (Hadoop 0.20.2) should be removed as this certainly isn't supported either.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5863 from srowen/SPARK-7302 and squashes the following commits:
      
      42f5d1e [Sean Owen] Remove CDH3 (Hadoop 0.20.2) refs too
      dad02e3 [Sean Owen] Remove references to Hadoop 0.23
      9e25b09f
  36. May 01, 2015
  37. Apr 24, 2015
    • Calvin Jia's avatar
      [SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3 · 438859eb
      Calvin Jia authored
      This is a reopening of #4867.
      A short summary of the issues resolved from the previous PR:
      
      1. HTTPClient version mismatch: Selenium (used for UI tests) requires version 4.3.x, and Tachyon included 4.2.5 through a transitive dependency of its shaded thrift jar. To address this, Tachyon 0.6.3 will promote the transitive dependencies of the shaded jar so they can be excluded in spark.
      
      2. Jackson-Mapper-ASL version mismatch: In lower versions of hadoop-client (ie. 1.0.4), version 1.0.1 is included. The parquet library used in spark sql requires version 1.8+. Its unclear to me why upgrading tachyon-client would cause this dependency to break. The solution was to exclude jackson-mapper-asl from hadoop-client.
      
      It seems that the dependency management in spark-parent will not work on transitive dependencies, one way to make sure jackson-mapper-asl is included with the correct version is to add it as a top level dependency. The best solution would be to exclude the dependency in the modules which require a higher version, but that did not fix the unit tests. Any suggestions on the best way to solve this would be appreciated!
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #5354 from calvinjia/upgrade_tachyon_0.6.3 and squashes the following commits:
      
      0eefe4d [Calvin Jia] Handle httpclient version in maven dependency management. Remove httpclient version setting from profiles.
      7c00dfa [Calvin Jia] Set httpclient version to 4.3.2 for selenium. Specify version of httpclient for sql/hive (previously 4.2.5 transitive dependency of libthrift).
      9263097 [Calvin Jia] Merge master to test latest changes
      dbfc1bd [Calvin Jia] Use Tachyon 0.6.4 for cleaner dependencies.
      e2ff80a [Calvin Jia] Exclude the jetty and curator promoted dependencies from tachyon-client.
      a3a29da [Calvin Jia] Update tachyon-client exclusions.
      0ae6c97 [Calvin Jia] Change tachyon version to 0.6.3
      a204df9 [Calvin Jia] Update make distribution tachyon version.
      a93c94f [Calvin Jia] Exclude jackson-mapper-asl from hadoop client since it has a lower version than spark's expected version.
      a8a923c [Calvin Jia] Exclude httpcomponents from Tachyon
      910fabd [Calvin Jia] Update to master
      eed9230 [Calvin Jia] Update tachyon version to 0.6.1.
      11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings.
      71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
      438859eb
Loading