Skip to content
Snippets Groups Projects
  1. May 13, 2016
  2. May 12, 2016
    • bomeng's avatar
      [SPARK-14897][SQL] upgrade to jetty 9.2.16 · 81bf8708
      bomeng authored
      ## What changes were proposed in this pull request?
      
      Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+.
      
      `javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version.
      
      ## How was this patch tested?
      
      Manual test and current test cases should cover it.
      
      Author: bomeng <bmeng@us.ibm.com>
      
      Closes #12916 from bomeng/SPARK-14897.
      81bf8708
  3. May 05, 2016
    • hyukjinkwon's avatar
      [SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0 · ac12b35d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-15148
      
      Mainly it improves the performance roughtly about 30%-40% according to the [release note](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.1.0). For the details of the purpose is described in the JIRA.
      
      This PR upgrades Univocity library from 2.0.2 to 2.1.0.
      
      ## How was this patch tested?
      
      Existing tests should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12923 from HyukjinKwon/SPARK-15148.
      ac12b35d
    • mcheah's avatar
      [SPARK-12154] Upgrade to Jersey 2 · b7fdc23c
      mcheah authored
      ## What changes were proposed in this pull request?
      
      Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.
      
      ## How was this patch tested?
      
      I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #12715 from mccheah/feature/upgrade-jersey.
      b7fdc23c
    • Lining Sun's avatar
      [SPARK-15123] upgrade org.json4s to 3.2.11 version · 592fc455
      Lining Sun authored
      ## What changes were proposed in this pull request?
      
      We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11.
      
      ## How was this patch tested?
      
      We built Spark jar and successfully ran our applications in local and cluster modes.
      
      Author: Lining Sun <lining@gmail.com>
      
      Closes #12901 from liningalex/master.
      592fc455
  4. Apr 29, 2016
    • Davies Liu's avatar
      [SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver · 7feeb82c
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR copy the thrift-server from hive-service-1.2 (including  TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #12764 from davies/thrift_server.
      7feeb82c
  5. Apr 21, 2016
  6. Apr 08, 2016
    • Josh Rosen's avatar
      [SPARK-11416][BUILD] Update to Chill 0.8.0 & Kryo 3.0.3 · 906eef4c
      Josh Rosen authored
      This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / https://github.com/twitter/chill/issues/252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #12076 from JoshRosen/kryo3.
      906eef4c
    • hyukjinkwon's avatar
      [SPARK-14103][SQL] Parse unescaped quotes in CSV data source. · 725b860e
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below:
      
      ```
      "a"b,ccc,ddd
      e,f,g
      ```
      
      produces a data below:
      
      - **Before**
      
      ```bash
      ["a"b,ccc,ddd[\n]e,f,g]  <- as a value.
      ```
      
      - **After**
      
      ```bash
      ["a"b], [ccc], [ddd]
      [e], [f], [g]
      ```
      
      This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60.
      
      ## How was this patch tested?
      
      Unit tests in `CSVSuite` and `sbt/sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12226 from HyukjinKwon/SPARK-14103-quote.
      725b860e
  7. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  8. Apr 01, 2016
    • Jacek Laskowski's avatar
      [SPARK-13825][CORE] Upgrade to Scala 2.11.8 · c16a3968
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Upgrade to 2.11.8 (from the current 2.11.7)
      
      ## How was this patch tested?
      
      A manual build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #11681 from jaceklaskowski/SPARK-13825-scala-2_11_8.
      c16a3968
  9. Mar 31, 2016
    • Sital Kedia's avatar
      [SPARK-14277][CORE] Upgrade Snappy Java to 1.1.2.4 · 8de201ba
      Sital Kedia authored
      ## What changes were proposed in this pull request?
      
      Upgrade snappy to 1.1.2.4 to improve snappy read/write performance.
      
      ## How was this patch tested?
      
      Tested by running a job on the cluster and saw 7.5% cpu savings after this change.
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #12096 from sitalkedia/snappyRelease.
      8de201ba
    • Herman van Hovell's avatar
      [SPARK-14211][SQL] Remove ANTLR3 based parser · a9b93e07
      Herman van Hovell authored
      ### What changes were proposed in this pull request?
      
      This PR removes the ANTLR3 based parser, and moves the new ANTLR4 based parser into the `org.apache.spark.sql.catalyst.parser package`.
      
      ### How was this patch tested?
      
      Existing unit tests.
      
      cc rxin andrewor14 yhuai
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #12071 from hvanhovell/SPARK-14211.
      a9b93e07
  10. Mar 28, 2016
    • Herman van Hovell's avatar
      [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4 · 600c0b69
      Herman van Hovell authored
      ### What changes were proposed in this pull request?
      The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4.
      
      This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs.
      
      This PR is a work in progress, and work needs to be done in the following area's:
      
      - [x] Error handling should be improved.
      - [x] Documentation should be improved.
      - [x] Multi-Insert needs to be tested.
      - [ ] Naming and package locations.
      
      ### How was this patch tested?
      
      Catalyst and SQL unit tests.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #11557 from hvanhovell/ngParser.
      600c0b69
  11. Mar 14, 2016
  12. Mar 10, 2016
  13. Mar 03, 2016
    • Steve Loughran's avatar
      [SPARK-13599][BUILD] remove transitive groovy dependencies from Hive · 9a48c656
      Steve Loughran authored
      ## What changes were proposed in this pull request?
      
      Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.
      
      This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.
      
      ## How was this patch tested?
      
      1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
      1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
      1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
      1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
      1. Patch applied
      1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
      1. Examined created spark-assembly, verified no org.codehaus packages
      1. Verified that the maven dependency tree no longer references groovy
      
      Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
      9a48c656
  14. Feb 27, 2016
  15. Feb 17, 2016
  16. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  17. Jan 22, 2016
    • Shixiong Zhu's avatar
      [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming · bc1babd6
      Shixiong Zhu authored
      - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
      - Remove HttpFileServer
      - Remove Akka configs from SparkConf and SSLOptions
      - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
      - Update comments and docs
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10854 from zsxwing/remove-akka.
      bc1babd6
  18. Jan 15, 2016
    • Josh Rosen's avatar
      [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile · 8dbbf3e7
      Josh Rosen authored
      This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.
      
      /cc rxin srowen
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
      8dbbf3e7
    • Reynold Xin's avatar
      [SPARK-12667] Remove block manager's internal "external block store" API · ad1503f9
      Reynold Xin authored
      This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems.
      
      There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10752 from rxin/remove-offheap.
      ad1503f9
    • Hossein's avatar
      [SPARK-12833][SQL] Initial import of spark-csv · 5f83c699
      Hossein authored
      CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Having to rely on a 3rd party component for this leads to poor user experience for new users. This PR merges the popular spark-csv data source package (https://github.com/databricks/spark-csv) with SparkSQL.
      
      This is a first PR to bring the functionality to spark 2.0 master. We will complete items outlines in the design document (see JIRA attachment) in follow up pull requests.
      
      Author: Hossein <hossein@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10766 from rxin/csv.
      5f83c699
  19. Jan 12, 2016
  20. Jan 11, 2016
    • BrianLondon's avatar
      [SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4
      BrianLondon authored
      The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.
      
      Author: BrianLondon <brian@seatgeek.com>
      
      Closes #10256 from BrianLondon/master.
      8fe928b4
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f
      Josh Rosen authored
      This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.
      
      While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10693 from JoshRosen/netty-hotfix.
      f13c7f8f
  21. Jan 10, 2016
    • Josh Rosen's avatar
      [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b
      Josh Rosen authored
      Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).
      
      This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.
      
      /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.
      
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10672 from JoshRosen/enforce-netty-exclusions.
      3ab0138b
  22. Jan 06, 2016
    • Herman van Hovell's avatar
      [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst · ea489f14
      Herman van Hovell authored
      This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:
      
      The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling.
      
      The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
      - ```CatalystQl```: This implements Query and Expression parsing functionality.
      - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
      - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.
      
      cc rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10583 from hvanhovell/SPARK-12575.
      ea489f14
  23. Jan 04, 2016
    • Josh Rosen's avatar
      [SPARK-12612][PROJECT-INFRA] Add missing Hadoop profiles to dev/run-tests-*.py scripts and dev/deps · 0d165ec2
      Josh Rosen authored
      There are a couple of places in the `dev/run-tests-*.py` scripts which deal with Hadoop profiles, but the set of profiles that they handle does not include all Hadoop profiles defined in our POM. Similarly, the `hadoop-2.2` and `hadoop-2.6` profiles were missing from `dev/deps`.
      
      This patch updates these scripts to include all four Hadoop profiles defined in our POM.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10565 from JoshRosen/add-missing-hadoop-profiles-in-test-scripts.
      0d165ec2
  24. Dec 30, 2015
    • Josh Rosen's avatar
      [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests · 27a42c71
      Josh Rosen authored
      This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath.
      
      This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs.
      
      This patch is based on pwendell's work in #8531.
      
      Closes #8531.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #10461 from JoshRosen/SPARK-10359.
      27a42c71
Loading