Skip to content
Snippets Groups Projects
  1. Apr 25, 2017
    • Yanbo Liang's avatar
      [SPARK-20449][ML] Upgrade breeze version to 0.13.1 · 67eef47a
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B.
      
      ## How was this patch tested?
      Existing unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #17746 from yanboliang/spark-20449.
      Unverified
      67eef47a
  2. Feb 08, 2017
    • Sean Owen's avatar
      [SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier · e8d3fca4
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove support for Hadoop 2.5 and earlier
      - Remove reflection and code constructs only needed to support multiple versions at once
      - Update docs to reflect newer versions
      - Remove older versions' builds and profiles.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16810 from srowen/SPARK-19464.
      Unverified
      e8d3fca4
  3. Jan 31, 2017
  4. Jan 18, 2017
    • Adam Roberts's avatar
      [SPARK-18782][BUILD] Bump Hadoop 2.6 version to use Hadoop 2.6.5 · 17ce0b5b
      Adam Roberts authored
      **What changes were proposed in this pull request?**
      
      Use Hadoop 2.6.5 for the Hadoop 2.6 profile, I see a bunch of fixes including security ones in the release notes that we should pick up
      
      **How was this patch tested?**
      
      Running the unit tests now with IBM's SDK for Java and let's see what happens with OpenJDK in the community builder - expecting no trouble as it is only a minor release.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #16616 from a-roberts/Hadoop265Bumper.
      Unverified
      17ce0b5b
  5. Jan 15, 2017
  6. Jan 10, 2017
  7. Dec 21, 2016
    • Yin Huai's avatar
      [SPARK-18951] Upgrade com.thoughtworks.paranamer/paranamer to 2.6 · 1a643889
      Yin Huai authored
      ## What changes were proposed in this pull request?
      I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes jackson fail to handle byte array defined in a case class. Then I find https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests that it is caused by a bug in paranamer. Let's upgrade paranamer. Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade paranamer to 2.6.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #16359 from yhuai/SPARK-18951.
      1a643889
  8. Dec 03, 2016
  9. Nov 28, 2016
    • Yin Huai's avatar
      [SPARK-18602] Set the version of org.codehaus.janino:commons-compiler to 3.0.0... · eba72775
      Yin Huai authored
      [SPARK-18602] Set the version of org.codehaus.janino:commons-compiler to 3.0.0 to match the version of org.codehaus.janino:janino
      
      ## What changes were proposed in this pull request?
      org.codehaus.janino:janino depends on org.codehaus.janino:commons-compiler and we have been upgraded to org.codehaus.janino:janino 3.0.0.
      
      However, seems we are still pulling in org.codehaus.janino:commons-compiler 2.7.6 because of calcite. It looks like an accident because we exclude janino from calcite (see here https://github.com/apache/spark/blob/branch-2.1/pom.xml#L1759). So, this PR upgrades org.codehaus.janino:commons-compiler to 3.0.0.
      
      ## How was this patch tested?
      jenkins
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #16025 from yhuai/janino-commons-compile.
      eba72775
  10. Nov 12, 2016
    • Guoqiang Li's avatar
      [SPARK-18375][SPARK-18383][BUILD][CORE] Upgrade netty to 4.0.42.Final · bc41d997
      Guoqiang Li authored
      ## What changes were proposed in this pull request?
      
      One of the important changes for 4.0.42.Final is "Support any FileRegion implementation when using epoll transport netty/netty#5825".
      In 4.0.42.Final, `MessageWithHeader` can work properly when `spark.[shuffle|rpc].io.mode` is set to epoll
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Guoqiang Li <witgo@qq.com>
      
      Closes #15830 from witgo/SPARK-18375_netty-4.0.42.
      Unverified
      bc41d997
  11. Nov 10, 2016
    • Sean Owen's avatar
      [SPARK-18262][BUILD][SQL] JSON.org license is now CatX · 16eaad9d
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the case that it's not used by the part of Hive Spark uses anyway.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15798 from srowen/SPARK-18262.
      16eaad9d
  12. Oct 21, 2016
    • Jagadeesan's avatar
      [SPARK-17960][PYSPARK][UPGRADE TO PY4J 0.10.4] · 595893d3
      Jagadeesan authored
      ## What changes were proposed in this pull request?
      
      1) Upgrade the Py4J version on the Java side
      2) Update the py4j src zip file we bundle with Spark
      
      ## How was this patch tested?
      
      Existing doctests & unit tests pass
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #15514 from jagadeesanas2/SPARK-17960.
      Unverified
      595893d3
  13. Oct 19, 2016
  14. Oct 18, 2016
    • Reynold Xin's avatar
      Revert "[SPARK-17985][CORE] Bump commons-lang3 version to 3.5." · cd662bc7
      Reynold Xin authored
      This reverts commit bfe7885a.
      
      The commit caused build failures on Hadoop 2.2 profile:
      
      ```
      [error] /scratch/rxin/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:1489: value read is not a member of object org.apache.commons.io.IOUtils
      [error]       var numBytes = IOUtils.read(gzInputStream, buf)
      [error]                              ^
      [error] /scratch/rxin/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:1492: value read is not a member of object org.apache.commons.io.IOUtils
      [error]         numBytes = IOUtils.read(gzInputStream, buf)
      [error]                            ^
      ```
      cd662bc7
    • Takuya UESHIN's avatar
      [SPARK-17985][CORE] Bump commons-lang3 version to 3.5. · bfe7885a
      Takuya UESHIN authored
      ## What changes were proposed in this pull request?
      
      `SerializationUtils.clone()` of commons-lang3 (<3.5) has a bug that breaks thread safety, which gets stack sometimes caused by race condition of initializing hash map.
      See https://issues.apache.org/jira/browse/LANG-1251.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #15525 from ueshin/issues/SPARK-17985.
      bfe7885a
  15. Oct 11, 2016
    • Bryan Cutler's avatar
      [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13 · 658c7147
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Upgraded to a newer version of Pyrolite which supports serialization of a BinaryType StructField for PySpark.SQL
      
      ## How was this patch tested?
      Added a unit test which fails with a raised ValueError when using the previous version of Pyrolite 4.9 and Python3
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #15386 from BryanCutler/pyrolite-upgrade-SPARK-17808.
      Unverified
      658c7147
  16. Sep 21, 2016
  17. Sep 16, 2016
    • Reynold Xin's avatar
      [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 · dca771be
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, which was recently released and contained a number of bug fixes.
      
      ## How was this patch tested?
      The change should be covered by existing tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #15115 from rxin/SPARK-17558.
      dca771be
  18. Sep 15, 2016
    • Adam Roberts's avatar
      [SPARK-17379][BUILD] Upgrade netty-all to 4.0.41 final for bug fixes · 0ad8eeb4
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      Upgrade netty-all to latest in the 4.0.x line which is 4.0.41, mentions several bug fixes and performance improvements we may find useful, see netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html. Initially tried to use 4.1.5 but noticed it's not backwards compatible.
      
      ## How was this patch tested?
      Existing unit tests against branch-1.6 and branch-2.0 using IBM Java 8 on Intel, Power and Z architectures
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14961 from a-roberts/netty.
      0ad8eeb4
  19. Sep 06, 2016
    • Adam Roberts's avatar
      [SPARK-17378][BUILD] Upgrade snappy-java to 1.1.2.6 · 6c08dbf6
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"
      
      ## How was this patch tested?
      Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14958 from a-roberts/master.
      6c08dbf6
  20. Aug 30, 2016
    • Ferdinand Xu's avatar
      [SPARK-5682][CORE] Add encrypted shuffle in spark · 4b4e329e
      Ferdinand Xu authored
      This patch is using Apache Commons Crypto library to enable shuffle encryption support.
      
      Author: Ferdinand Xu <cheng.a.xu@intel.com>
      Author: kellyzly <kellyzly@126.com>
      
      Closes #8880 from winningsix/SPARK-10771.
      4b4e329e
  21. Aug 24, 2016
    • Sean Owen's avatar
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same... · 0b3a4be9
      Sean Owen authored
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
      
      ## What changes were proposed in this pull request?
      
      Update to py4j 0.10.3 to enable JAVA_HOME support
      
      ## How was this patch tested?
      
      Pyspark tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14748 from srowen/SPARK-16781.
      0b3a4be9
  22. Aug 03, 2016
    • Stefan Schulze's avatar
      [SPARK-16770][BUILD] Fix JLine dependency management and version (Sca… · 4775eb41
      Stefan Schulze authored
      ## What changes were proposed in this pull request?
      As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's  dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages:
      - Exclude transitive dependency 'jline:jline' from hive-exec module
      - Remove global properties 'jline.version' and 'jline.groupId'
      - Add both properties and dependency to 'scala-2.11' profile
      - Add explicit dependency on 'jline:jline' to  module 'spark-repl'
      
      ## How was this patch tested?
      - Running mvn dependency:tree and checking for correct Jline version 2.12.1
      - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball
      
      Author: Stefan Schulze <stefan.schulze@pentasys.de>
      
      Closes #14429 from stsc-pentasys/SPARK-16770.
      4775eb41
  23. Jul 29, 2016
    • Michael Gummelt's avatar
      [SPARK-16637] Unified containerizer · 266b92fa
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      New config var: spark.mesos.docker.containerizer={"mesos","docker" (default)}
      
      This adds support for running docker containers via the Mesos unified containerizer: http://mesos.apache.org/documentation/latest/container-image/
      
      The benefit is losing the dependency on `dockerd`, and all the costs which it incurs.
      
      I've also updated the supported Mesos version to 0.28.2 for support of the required protobufs.
      
      This is blocked on: https://github.com/apache/spark/pull/14167
      
      ## How was this patch tested?
      
      - manually testing jobs submitted with both "mesos" and "docker" settings for the new config var.
      - spark/mesos integration test suite
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14275 from mgummelt/unified-containerizer.
      266b92fa
    • Adam Roberts's avatar
      [SPARK-16751] Upgrade derby to 10.12.1.1 · 04a2c072
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Version of derby upgraded based on important security info at VersionEye. Test scope added so we don't include it in our final package anyway. NB: I think this should be backported to all previous releases as it is a security problem https://www.versioneye.com/java/org.apache.derby:derby/10.11.1.1
      
      The CVE number is 2015-1832. I also suggest we add a SECURITY tag for JIRAs
      
      ## How was this patch tested?
      Existing tests with the change making sure that we see no new failures. I checked derby 10.12.x and not derby 10.11.x is downloaded to our ~/.m2 folder.
      
      I then used dev/make-distribution.sh and checked the dist/jars folder for Spark 2.0: no derby jar is present.
      
      I don't know if this would also remove it from the assembly jar in our 1.x branches.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14379 from a-roberts/patch-4.
      04a2c072
  24. Jul 26, 2016
    • Philipp Hoffmann's avatar
      [SPARK-15271][MESOS] Allow force pulling executor docker images · 0869b3a5
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Mesos agents by default will not pull docker images which are cached
      locally already. In order to run Spark executors from mutable tags like
      `:latest` this commit introduces a Spark setting
      (`spark.mesos.executor.docker.forcePullImage`). Setting this flag to
      true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
      implementation and Mesos' default
      behaviour).
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #14348 from philipphoffmann/force-pull-image.
      0869b3a5
  25. Jul 25, 2016
    • Josh Rosen's avatar
      fc17121d
    • Philipp Hoffmann's avatar
      [SPARK-15271][MESOS] Allow force pulling executor docker images · 978cd5f1
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      Mesos agents by default will not pull docker images which are cached
      locally already. In order to run Spark executors from mutable tags like
      `:latest` this commit introduces a Spark setting
      `spark.mesos.executor.docker.forcePullImage`. Setting this flag to
      true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
      implementation and Mesos' default
      behaviour).
      
      ## How was this patch tested?
      
      I ran a sample application including this change on a Mesos cluster and verified the correct behaviour for both, with and without, force pulling the executor image. As expected the image is being force pulled if the flag is set.
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #13051 from philipphoffmann/force-pull-image.
      978cd5f1
  26. Jul 19, 2016
    • Yanbo Liang's avatar
      [SPARK-16494][ML] Upgrade breeze version to 0.12 · 67089149
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      breeze 0.12 has been released for more than half a year, and it brings lots of new features, performance improvement and bug fixes.
      One of the biggest features is ```LBFGS-B``` which is an implementation of ```LBFGS``` with box constraints and much faster for some special case.
      We would like to implement Huber loss function for ```LinearRegression``` ([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it requires ```LBFGS-B``` as the optimization solver. So we should bump up the dependent breeze version to 0.12.
      For more features, improvements and bug fixes of breeze 0.12, you can refer the following link:
      https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c
      
      ## How was this patch tested?
      No new tests, should pass the existing ones.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14150 from yanboliang/spark-16494.
      67089149
  27. Jul 10, 2016
  28. Jun 09, 2016
    • Adam Roberts's avatar
      [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2 · 147c0208
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Existing tests
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use
      
      https://hadoop.apache.org/docs/r2.7.0/ states
      
      "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
      This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."
      
      Hadoop 2.7.1 release notes:
      "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."
      
      And then Hadoop 2.7.2 release notes:
      "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."
      
      I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13556 from a-roberts/patch-2.
      147c0208
  29. May 31, 2016
  30. May 27, 2016
    • Ryan Blue's avatar
      [SPARK-9876][SQL] Update Parquet to 1.8.1. · 776d183c
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      This includes minimal changes to get Spark using the current release of Parquet, 1.8.1.
      
      ## How was this patch tested?
      
      This uses the existing Parquet tests.
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #13280 from rdblue/SPARK-9876-update-parquet.
      776d183c
  31. May 26, 2016
    • Villu Ruusmann's avatar
      [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15 · 6d506c9a
      Villu Ruusmann authored
      ## What changes were proposed in this pull request?
      
      See https://issues.apache.org/jira/browse/SPARK-15523
      
      This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes.
      
      ## How was this patch tested?
      
      1. Executed `mvn clean package` in `mllib` directory
      2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory.
      
      Author: Villu Ruusmann <villu.ruusmann@gmail.com>
      
      Closes #13297 from vruusmann/update-jpmml.
      6d506c9a
  32. May 25, 2016
  33. May 24, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work · c24b6b67
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF".  Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed.
      
      ## How was this patch tested?
      
      `JsonParsingOptionsSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #9759 from viirya/fix-json-nonnumric.
      c24b6b67
  34. May 16, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update... · fabc8e5b
      Sean Owen authored
      [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update org.apache.httpcomponents.httpclient, commons-io
      
      ## What changes were proposed in this pull request?
      
      This is sort of a hot-fix for https://github.com/apache/spark/pull/13117, but, the problem is limited to Hadoop 2.2. The change is to manage `commons-io` to 2.4 for all Hadoop builds, which is only a net change for Hadoop 2.2, which was using 2.1.
      
      ## How was this patch tested?
      
      Jenkins tests -- normal PR builder, then the `[test-hadoop2.2] [test-maven]` if successful.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13132 from srowen/SPARK-12972.3.
      fabc8e5b
  35. May 15, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient · f5576a05
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      (Retry of https://github.com/apache/spark/pull/13049)
      
      - update to httpclient 4.5 / httpcore 4.4
      - remove some defunct exclusions
      - manage httpmime version to match
      - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used)
      
      ## How was this patch tested?
      
      Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...`
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13117 from srowen/SPARK-12972.2.
      f5576a05
  36. May 13, 2016
Loading