Skip to content
Snippets Groups Projects
  1. Jun 16, 2016
  2. Jun 14, 2016
    • Shixiong Zhu's avatar
      [SPARK-15935][PYSPARK] Fix a wrong format tag in the error message · 0ee9fd9e
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      A follow up PR for #13655 to fix a wrong format tag.
      
      ## How was this patch tested?
      
      Jenkins unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13665 from zsxwing/fix.
      0ee9fd9e
    • Adam Roberts's avatar
      [SPARK-15821][DOCS] Include parallel build info · a431e3f1
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      We should mention that users can build Spark using multiple threads to decrease build times; either here or in "Building Spark"
      
      ## How was this patch tested?
      
      Built on machines with between one core to 192 cores using mvn -T 1C and observed faster build times with no loss in stability
      
      In response to the question here https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest this option as we know it works for Spark and can result in faster builds
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13562 from a-roberts/patch-3.
      a431e3f1
    • Shixiong Zhu's avatar
      [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests · 96c3500c
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR just enables tests for sql/streaming.py and also fixes the failures.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13655 from zsxwing/python-streaming-test.
      96c3500c
  3. Jun 09, 2016
    • Adam Roberts's avatar
      [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2 · 147c0208
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Existing tests
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use
      
      https://hadoop.apache.org/docs/r2.7.0/ states
      
      "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
      This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."
      
      Hadoop 2.7.1 release notes:
      "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."
      
      And then Hadoop 2.7.2 release notes:
      "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."
      
      I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13556 from a-roberts/patch-2.
      147c0208
    • Josh Rosen's avatar
      [SPARK-12712] Fix failure in ./dev/test-dependencies when run against empty .m2 cache · 921fa40b
      Josh Rosen authored
      This patch fixes a bug in `./dev/test-dependencies.sh` which caused spurious failures when the script was run on a machine with an empty `.m2` cache. The problem was that extra log output from the dependency download was conflicting with the grep / regex used to identify the classpath in the Maven output. This patch fixes this issue by adjusting the regex pattern.
      
      Tested manually with the following reproduction of the bug:
      
      ```
      rm -rf ~/.m2/repository/org/apache/commons/
      ./dev/test-dependencies.sh
      ```
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #13568 from JoshRosen/SPARK-12712.
      921fa40b
  4. Jun 08, 2016
    • Sandeep Singh's avatar
      [MINOR] Fix Java Lint errors introduced by #13286 and #13280 · f958c1c3
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      
      revived #13464
      
      Fix Java Lint errors introduced by #13286 and #13280
      Before:
      ```
      Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
      Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
      ```
      
      ## How was this patch tested?
      ran `dev/lint-java` locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #13559 from techaddict/minor-3.
      f958c1c3
  5. May 31, 2016
  6. May 27, 2016
    • Ryan Blue's avatar
      [SPARK-9876][SQL] Update Parquet to 1.8.1. · 776d183c
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      This includes minimal changes to get Spark using the current release of Parquet, 1.8.1.
      
      ## How was this patch tested?
      
      This uses the existing Parquet tests.
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #13280 from rdblue/SPARK-9876-update-parquet.
      776d183c
  7. May 26, 2016
    • Villu Ruusmann's avatar
      [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15 · 6d506c9a
      Villu Ruusmann authored
      ## What changes were proposed in this pull request?
      
      See https://issues.apache.org/jira/browse/SPARK-15523
      
      This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes.
      
      ## How was this patch tested?
      
      1. Executed `mvn clean package` in `mllib` directory
      2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory.
      
      Author: Villu Ruusmann <villu.ruusmann@gmail.com>
      
      Closes #13297 from vruusmann/update-jpmml.
      6d506c9a
  8. May 25, 2016
  9. May 24, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work · c24b6b67
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF".  Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed.
      
      ## How was this patch tested?
      
      `JsonParsingOptionsSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #9759 from viirya/fix-json-nonnumric.
      c24b6b67
  10. May 21, 2016
    • Reynold Xin's avatar
      [SPARK-15424][SPARK-15437][SPARK-14807][SQL] Revert Create a hivecontext-compatibility module · 45b7557e
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      I initially asked to create a hivecontext-compatibility module to put the HiveContext there. But we are so close to Spark 2.0 release and there is only a single class in it. It seems overkill to have an entire package, which makes it more inconvenient, for a single class.
      
      ## How was this patch tested?
      Tests were moved.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13207 from rxin/SPARK-15424.
      45b7557e
  11. May 20, 2016
    • Sameer Agarwal's avatar
      [SPARK-15078] [SQL] Add all TPCDS 1.4 benchmark queries for SparkSQL · a78d6ce3
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      Now that SparkSQL supports all TPC-DS queries, this patch adds all 99 benchmark queries inside SparkSQL.
      
      ## How was this patch tested?
      
      Benchmark only
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #13188 from sameeragarwal/tpcds-all.
      a78d6ce3
  12. May 17, 2016
    • DB Tsai's avatar
      [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms · e2efe052
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12627 from dbtsai/SPARK-14615-NewML.
      e2efe052
    • Sean Owen's avatar
      [SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags · 122302cb
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      (See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.)
      
      Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags`
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13074 from srowen/SPARK-15290.
      122302cb
  13. May 16, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update... · fabc8e5b
      Sean Owen authored
      [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update org.apache.httpcomponents.httpclient, commons-io
      
      ## What changes were proposed in this pull request?
      
      This is sort of a hot-fix for https://github.com/apache/spark/pull/13117, but, the problem is limited to Hadoop 2.2. The change is to manage `commons-io` to 2.4 for all Hadoop builds, which is only a net change for Hadoop 2.2, which was using 2.1.
      
      ## How was this patch tested?
      
      Jenkins tests -- normal PR builder, then the `[test-hadoop2.2] [test-maven]` if successful.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13132 from srowen/SPARK-12972.3.
      fabc8e5b
  14. May 15, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient · f5576a05
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      (Retry of https://github.com/apache/spark/pull/13049)
      
      - update to httpclient 4.5 / httpcore 4.4
      - remove some defunct exclusions
      - manage httpmime version to match
      - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used)
      
      ## How was this patch tested?
      
      Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...`
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13117 from srowen/SPARK-12972.2.
      f5576a05
  15. May 13, 2016
  16. May 12, 2016
  17. May 11, 2016
  18. May 05, 2016
    • hyukjinkwon's avatar
      [SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0 · ac12b35d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-15148
      
      Mainly it improves the performance roughtly about 30%-40% according to the [release note](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.1.0). For the details of the purpose is described in the JIRA.
      
      This PR upgrades Univocity library from 2.0.2 to 2.1.0.
      
      ## How was this patch tested?
      
      Existing tests should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12923 from HyukjinKwon/SPARK-15148.
      ac12b35d
    • mcheah's avatar
      [SPARK-12154] Upgrade to Jersey 2 · b7fdc23c
      mcheah authored
      ## What changes were proposed in this pull request?
      
      Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.
      
      ## How was this patch tested?
      
      I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #12715 from mccheah/feature/upgrade-jersey.
      b7fdc23c
    • Lining Sun's avatar
      [SPARK-15123] upgrade org.json4s to 3.2.11 version · 592fc455
      Lining Sun authored
      ## What changes were proposed in this pull request?
      
      We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11.
      
      ## How was this patch tested?
      
      We built Spark jar and successfully ran our applications in local and cluster modes.
      
      Author: Lining Sun <lining@gmail.com>
      
      Closes #12901 from liningalex/master.
      592fc455
  19. May 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15053][BUILD] Fix Java Lint errors on Hive-Thriftserver module · a7444570
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue fixes or hides 181 Java linter errors introduced by SPARK-14987 which copied hive service code from Hive. We had better clean up these errors before releasing Spark 2.0.
      
      - Fix UnusedImports (15 lines), RedundantModifier (14 lines), SeparatorWrap (9 lines), MethodParamPad (6 lines), FileTabCharacter (5 lines), ArrayTypeStyle (3 lines), ModifierOrder (3 lines), RedundantImport (1 line), CommentsIndentation (1 line), UpperEll (1 line), FallThrough (1 line), OneStatementPerLine (1 line), NewlineAtEndOfFile (1 line) errors.
      - Ignore `LineLength` errors under `hive/service/*` (118 lines).
      - Ignore `MethodName` error in `PasswdAuthenticationProvider.java` (1 line).
      - Ignore `NoFinalizer` error in `ThreadWithGarbageCleanup.java` (1 line).
      
      ## How was this patch tested?
      
      After passing Jenkins building, run `dev/lint-java` manually.
      ```bash
      $ dev/lint-java
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12831 from dongjoon-hyun/SPARK-15053.
      a7444570
  20. Apr 29, 2016
    • Andrew Or's avatar
      [SPARK-14988][PYTHON] SparkSession catalog and conf API · a7d0fedc
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12765 from andrewor14/python-spark-session-more.
      a7d0fedc
    • Davies Liu's avatar
      [SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver · 7feeb82c
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR copy the thrift-server from hive-service-1.2 (including  TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #12764 from davies/thrift_server.
      7feeb82c
  21. Apr 28, 2016
  22. Apr 27, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14867][BUILD] Remove `--force` option in `build/mvn` · f405de87
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Currently, `build/mvn` provides a convenient option, `--force`, in order to use the recommended version of maven without changing PATH environment variable. However, there were two problems.
      
      - `dev/lint-java` does not use the newly installed maven.
      
        ```bash
      $ ./build/mvn --force clean
      $ ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      ```
      - It's not easy to type `--force` option always.
      
      If '--force' option is used once, we had better prefer the installed maven recommended by Spark.
      This PR makes `build/mvn` check the existence of maven installed by `--force` option first.
      
      According to the comments, this PR aims to the followings:
      - Detect the maven version from `pom.xml`.
      - Install maven if there is no or old maven.
      - Remove `--force` option.
      
      ## How was this patch tested?
      
      Manual.
      
      ```bash
      $ ./build/mvn --force clean
      $ ./dev/lint-java
      Using `mvn` from path: /Users/dongjoon/spark/build/apache-maven-3.3.9/bin/mvn
      ...
      $ rm -rf ./build/apache-maven-3.3.9/
      $ ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12631 from dongjoon-hyun/SPARK-14867.
      f405de87
    • Dongjoon Hyun's avatar
      [MINOR][BUILD] Enable RAT checking on `LZ4BlockInputStream.java`. · c5443560
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Since `LZ4BlockInputStream.java` is not licensed to Apache Software Foundation (ASF), the Apache License header of that file is not monitored until now.
      This PR aims to enable RAT checking on `LZ4BlockInputStream.java` by excluding from `dev/.rat-excludes`.
      This will prevent accidental removal of Apache License header from that file.
      
      ## How was this patch tested?
      
      Pass the Jenkins tests (Specifically, RAT check stage).
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12677 from dongjoon-hyun/minor_rat_exclusion_file.
      c5443560
  23. Apr 25, 2016
    • Andrew Or's avatar
      [SPARK-14721][SQL] Remove HiveContext (part 2) · 3c5e65c3
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
      
      Note: A couple of things will break after this patch. These will be fixed separately.
      - the python HiveContext
      - all the documentation / comments referencing HiveContext
      - there will be no more HiveContext in the REPL (fixed by #12589)
      
      ## How was this patch tested?
      
      No change in functionality.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12585 from andrewor14/delete-hive-context.
      3c5e65c3
  24. Apr 24, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix lint-java errors · d34d6503
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items.
      
      - Adds a new line at the end of the files (19 files)
      - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder)
      
      ## How was this patch tested?
      
      After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.)
      ```bash
      $ dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12632 from dongjoon-hyun/SPARK-14868.
      d34d6503
  25. Apr 22, 2016
    • Yin Huai's avatar
      [SPARK-14807] Create a compatibility module · 7dde1da9
      Yin Huai authored
      ## What changes were proposed in this pull request?
      
      This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it.
      
      ## How was this patch tested?
      I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12580 from yhuai/compatibility.
      7dde1da9
  26. Apr 21, 2016
Loading