Skip to content
Snippets Groups Projects
  1. Sep 21, 2016
  2. Sep 16, 2016
    • Reynold Xin's avatar
      [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 · dca771be
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, which was recently released and contained a number of bug fixes.
      
      ## How was this patch tested?
      The change should be covered by existing tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #15115 from rxin/SPARK-17558.
      dca771be
  3. Sep 15, 2016
    • Adam Roberts's avatar
      [SPARK-17379][BUILD] Upgrade netty-all to 4.0.41 final for bug fixes · 0ad8eeb4
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      Upgrade netty-all to latest in the 4.0.x line which is 4.0.41, mentions several bug fixes and performance improvements we may find useful, see netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html. Initially tried to use 4.1.5 but noticed it's not backwards compatible.
      
      ## How was this patch tested?
      Existing unit tests against branch-1.6 and branch-2.0 using IBM Java 8 on Intel, Power and Z architectures
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14961 from a-roberts/netty.
      0ad8eeb4
  4. Sep 06, 2016
    • Adam Roberts's avatar
      [SPARK-17378][BUILD] Upgrade snappy-java to 1.1.2.6 · 6c08dbf6
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"
      
      ## How was this patch tested?
      Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14958 from a-roberts/master.
      6c08dbf6
  5. Aug 30, 2016
    • Ferdinand Xu's avatar
      [SPARK-5682][CORE] Add encrypted shuffle in spark · 4b4e329e
      Ferdinand Xu authored
      This patch is using Apache Commons Crypto library to enable shuffle encryption support.
      
      Author: Ferdinand Xu <cheng.a.xu@intel.com>
      Author: kellyzly <kellyzly@126.com>
      
      Closes #8880 from winningsix/SPARK-10771.
      4b4e329e
  6. Aug 24, 2016
    • Sean Owen's avatar
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same... · 0b3a4be9
      Sean Owen authored
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
      
      ## What changes were proposed in this pull request?
      
      Update to py4j 0.10.3 to enable JAVA_HOME support
      
      ## How was this patch tested?
      
      Pyspark tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14748 from srowen/SPARK-16781.
      0b3a4be9
  7. Aug 03, 2016
    • Stefan Schulze's avatar
      [SPARK-16770][BUILD] Fix JLine dependency management and version (Sca… · 4775eb41
      Stefan Schulze authored
      ## What changes were proposed in this pull request?
      As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's  dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages:
      - Exclude transitive dependency 'jline:jline' from hive-exec module
      - Remove global properties 'jline.version' and 'jline.groupId'
      - Add both properties and dependency to 'scala-2.11' profile
      - Add explicit dependency on 'jline:jline' to  module 'spark-repl'
      
      ## How was this patch tested?
      - Running mvn dependency:tree and checking for correct Jline version 2.12.1
      - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball
      
      Author: Stefan Schulze <stefan.schulze@pentasys.de>
      
      Closes #14429 from stsc-pentasys/SPARK-16770.
      4775eb41
  8. Jul 29, 2016
    • Michael Gummelt's avatar
      [SPARK-16637] Unified containerizer · 266b92fa
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      New config var: spark.mesos.docker.containerizer={"mesos","docker" (default)}
      
      This adds support for running docker containers via the Mesos unified containerizer: http://mesos.apache.org/documentation/latest/container-image/
      
      The benefit is losing the dependency on `dockerd`, and all the costs which it incurs.
      
      I've also updated the supported Mesos version to 0.28.2 for support of the required protobufs.
      
      This is blocked on: https://github.com/apache/spark/pull/14167
      
      ## How was this patch tested?
      
      - manually testing jobs submitted with both "mesos" and "docker" settings for the new config var.
      - spark/mesos integration test suite
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14275 from mgummelt/unified-containerizer.
      266b92fa
    • Adam Roberts's avatar
      [SPARK-16751] Upgrade derby to 10.12.1.1 · 04a2c072
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Version of derby upgraded based on important security info at VersionEye. Test scope added so we don't include it in our final package anyway. NB: I think this should be backported to all previous releases as it is a security problem https://www.versioneye.com/java/org.apache.derby:derby/10.11.1.1
      
      The CVE number is 2015-1832. I also suggest we add a SECURITY tag for JIRAs
      
      ## How was this patch tested?
      Existing tests with the change making sure that we see no new failures. I checked derby 10.12.x and not derby 10.11.x is downloaded to our ~/.m2 folder.
      
      I then used dev/make-distribution.sh and checked the dist/jars folder for Spark 2.0: no derby jar is present.
      
      I don't know if this would also remove it from the assembly jar in our 1.x branches.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14379 from a-roberts/patch-4.
      04a2c072
  9. Jul 26, 2016
    • Philipp Hoffmann's avatar
      [SPARK-15271][MESOS] Allow force pulling executor docker images · 0869b3a5
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Mesos agents by default will not pull docker images which are cached
      locally already. In order to run Spark executors from mutable tags like
      `:latest` this commit introduces a Spark setting
      (`spark.mesos.executor.docker.forcePullImage`). Setting this flag to
      true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
      implementation and Mesos' default
      behaviour).
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #14348 from philipphoffmann/force-pull-image.
      0869b3a5
  10. Jul 25, 2016
    • Josh Rosen's avatar
      fc17121d
    • Philipp Hoffmann's avatar
      [SPARK-15271][MESOS] Allow force pulling executor docker images · 978cd5f1
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      Mesos agents by default will not pull docker images which are cached
      locally already. In order to run Spark executors from mutable tags like
      `:latest` this commit introduces a Spark setting
      `spark.mesos.executor.docker.forcePullImage`. Setting this flag to
      true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
      implementation and Mesos' default
      behaviour).
      
      ## How was this patch tested?
      
      I ran a sample application including this change on a Mesos cluster and verified the correct behaviour for both, with and without, force pulling the executor image. As expected the image is being force pulled if the flag is set.
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #13051 from philipphoffmann/force-pull-image.
      978cd5f1
  11. Jul 19, 2016
    • Yanbo Liang's avatar
      [SPARK-16494][ML] Upgrade breeze version to 0.12 · 67089149
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      breeze 0.12 has been released for more than half a year, and it brings lots of new features, performance improvement and bug fixes.
      One of the biggest features is ```LBFGS-B``` which is an implementation of ```LBFGS``` with box constraints and much faster for some special case.
      We would like to implement Huber loss function for ```LinearRegression``` ([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it requires ```LBFGS-B``` as the optimization solver. So we should bump up the dependent breeze version to 0.12.
      For more features, improvements and bug fixes of breeze 0.12, you can refer the following link:
      https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c
      
      ## How was this patch tested?
      No new tests, should pass the existing ones.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14150 from yanboliang/spark-16494.
      67089149
  12. Jul 10, 2016
  13. Jun 09, 2016
    • Adam Roberts's avatar
      [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2 · 147c0208
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Existing tests
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use
      
      https://hadoop.apache.org/docs/r2.7.0/ states
      
      "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
      This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."
      
      Hadoop 2.7.1 release notes:
      "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."
      
      And then Hadoop 2.7.2 release notes:
      "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."
      
      I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13556 from a-roberts/patch-2.
      147c0208
  14. May 31, 2016
  15. May 27, 2016
    • Ryan Blue's avatar
      [SPARK-9876][SQL] Update Parquet to 1.8.1. · 776d183c
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      This includes minimal changes to get Spark using the current release of Parquet, 1.8.1.
      
      ## How was this patch tested?
      
      This uses the existing Parquet tests.
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #13280 from rdblue/SPARK-9876-update-parquet.
      776d183c
  16. May 26, 2016
    • Villu Ruusmann's avatar
      [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15 · 6d506c9a
      Villu Ruusmann authored
      ## What changes were proposed in this pull request?
      
      See https://issues.apache.org/jira/browse/SPARK-15523
      
      This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes.
      
      ## How was this patch tested?
      
      1. Executed `mvn clean package` in `mllib` directory
      2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory.
      
      Author: Villu Ruusmann <villu.ruusmann@gmail.com>
      
      Closes #13297 from vruusmann/update-jpmml.
      6d506c9a
  17. May 25, 2016
  18. May 24, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work · c24b6b67
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF".  Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed.
      
      ## How was this patch tested?
      
      `JsonParsingOptionsSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #9759 from viirya/fix-json-nonnumric.
      c24b6b67
  19. May 16, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update... · fabc8e5b
      Sean Owen authored
      [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update org.apache.httpcomponents.httpclient, commons-io
      
      ## What changes were proposed in this pull request?
      
      This is sort of a hot-fix for https://github.com/apache/spark/pull/13117, but, the problem is limited to Hadoop 2.2. The change is to manage `commons-io` to 2.4 for all Hadoop builds, which is only a net change for Hadoop 2.2, which was using 2.1.
      
      ## How was this patch tested?
      
      Jenkins tests -- normal PR builder, then the `[test-hadoop2.2] [test-maven]` if successful.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13132 from srowen/SPARK-12972.3.
      fabc8e5b
  20. May 15, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient · f5576a05
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      (Retry of https://github.com/apache/spark/pull/13049)
      
      - update to httpclient 4.5 / httpcore 4.4
      - remove some defunct exclusions
      - manage httpmime version to match
      - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used)
      
      ## How was this patch tested?
      
      Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...`
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13117 from srowen/SPARK-12972.2.
      f5576a05
  21. May 13, 2016
  22. May 12, 2016
    • bomeng's avatar
      [SPARK-14897][SQL] upgrade to jetty 9.2.16 · 81bf8708
      bomeng authored
      ## What changes were proposed in this pull request?
      
      Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+.
      
      `javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version.
      
      ## How was this patch tested?
      
      Manual test and current test cases should cover it.
      
      Author: bomeng <bmeng@us.ibm.com>
      
      Closes #12916 from bomeng/SPARK-14897.
      81bf8708
  23. May 05, 2016
    • hyukjinkwon's avatar
      [SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0 · ac12b35d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-15148
      
      Mainly it improves the performance roughtly about 30%-40% according to the [release note](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.1.0). For the details of the purpose is described in the JIRA.
      
      This PR upgrades Univocity library from 2.0.2 to 2.1.0.
      
      ## How was this patch tested?
      
      Existing tests should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12923 from HyukjinKwon/SPARK-15148.
      ac12b35d
    • mcheah's avatar
      [SPARK-12154] Upgrade to Jersey 2 · b7fdc23c
      mcheah authored
      ## What changes were proposed in this pull request?
      
      Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.
      
      ## How was this patch tested?
      
      I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #12715 from mccheah/feature/upgrade-jersey.
      b7fdc23c
    • Lining Sun's avatar
      [SPARK-15123] upgrade org.json4s to 3.2.11 version · 592fc455
      Lining Sun authored
      ## What changes were proposed in this pull request?
      
      We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11.
      
      ## How was this patch tested?
      
      We built Spark jar and successfully ran our applications in local and cluster modes.
      
      Author: Lining Sun <lining@gmail.com>
      
      Closes #12901 from liningalex/master.
      592fc455
  24. Apr 29, 2016
    • Davies Liu's avatar
      [SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver · 7feeb82c
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR copy the thrift-server from hive-service-1.2 (including  TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #12764 from davies/thrift_server.
      7feeb82c
  25. Apr 21, 2016
  26. Apr 08, 2016
    • Josh Rosen's avatar
      [SPARK-11416][BUILD] Update to Chill 0.8.0 & Kryo 3.0.3 · 906eef4c
      Josh Rosen authored
      This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / https://github.com/twitter/chill/issues/252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #12076 from JoshRosen/kryo3.
      906eef4c
    • hyukjinkwon's avatar
      [SPARK-14103][SQL] Parse unescaped quotes in CSV data source. · 725b860e
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below:
      
      ```
      "a"b,ccc,ddd
      e,f,g
      ```
      
      produces a data below:
      
      - **Before**
      
      ```bash
      ["a"b,ccc,ddd[\n]e,f,g]  <- as a value.
      ```
      
      - **After**
      
      ```bash
      ["a"b], [ccc], [ddd]
      [e], [f], [g]
      ```
      
      This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60.
      
      ## How was this patch tested?
      
      Unit tests in `CSVSuite` and `sbt/sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12226 from HyukjinKwon/SPARK-14103-quote.
      725b860e
  27. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  28. Apr 01, 2016
    • Jacek Laskowski's avatar
      [SPARK-13825][CORE] Upgrade to Scala 2.11.8 · c16a3968
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Upgrade to 2.11.8 (from the current 2.11.7)
      
      ## How was this patch tested?
      
      A manual build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #11681 from jaceklaskowski/SPARK-13825-scala-2_11_8.
      c16a3968
  29. Mar 31, 2016
    • Sital Kedia's avatar
      [SPARK-14277][CORE] Upgrade Snappy Java to 1.1.2.4 · 8de201ba
      Sital Kedia authored
      ## What changes were proposed in this pull request?
      
      Upgrade snappy to 1.1.2.4 to improve snappy read/write performance.
      
      ## How was this patch tested?
      
      Tested by running a job on the cluster and saw 7.5% cpu savings after this change.
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #12096 from sitalkedia/snappyRelease.
      8de201ba
    • Herman van Hovell's avatar
      [SPARK-14211][SQL] Remove ANTLR3 based parser · a9b93e07
      Herman van Hovell authored
      ### What changes were proposed in this pull request?
      
      This PR removes the ANTLR3 based parser, and moves the new ANTLR4 based parser into the `org.apache.spark.sql.catalyst.parser package`.
      
      ### How was this patch tested?
      
      Existing unit tests.
      
      cc rxin andrewor14 yhuai
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #12071 from hvanhovell/SPARK-14211.
      a9b93e07
  30. Mar 28, 2016
    • Herman van Hovell's avatar
      [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4 · 600c0b69
      Herman van Hovell authored
      ### What changes were proposed in this pull request?
      The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4.
      
      This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs.
      
      This PR is a work in progress, and work needs to be done in the following area's:
      
      - [x] Error handling should be improved.
      - [x] Documentation should be improved.
      - [x] Multi-Insert needs to be tested.
      - [ ] Naming and package locations.
      
      ### How was this patch tested?
      
      Catalyst and SQL unit tests.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #11557 from hvanhovell/ngParser.
      600c0b69
  31. Mar 14, 2016
Loading