Skip to content
Snippets Groups Projects
  1. May 05, 2016
    • Lining Sun's avatar
      [SPARK-15123] upgrade org.json4s to 3.2.11 version · 592fc455
      Lining Sun authored
      ## What changes were proposed in this pull request?
      
      We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11.
      
      ## How was this patch tested?
      
      We built Spark jar and successfully ran our applications in local and cluster modes.
      
      Author: Lining Sun <lining@gmail.com>
      
      Closes #12901 from liningalex/master.
      592fc455
  2. May 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15053][BUILD] Fix Java Lint errors on Hive-Thriftserver module · a7444570
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue fixes or hides 181 Java linter errors introduced by SPARK-14987 which copied hive service code from Hive. We had better clean up these errors before releasing Spark 2.0.
      
      - Fix UnusedImports (15 lines), RedundantModifier (14 lines), SeparatorWrap (9 lines), MethodParamPad (6 lines), FileTabCharacter (5 lines), ArrayTypeStyle (3 lines), ModifierOrder (3 lines), RedundantImport (1 line), CommentsIndentation (1 line), UpperEll (1 line), FallThrough (1 line), OneStatementPerLine (1 line), NewlineAtEndOfFile (1 line) errors.
      - Ignore `LineLength` errors under `hive/service/*` (118 lines).
      - Ignore `MethodName` error in `PasswdAuthenticationProvider.java` (1 line).
      - Ignore `NoFinalizer` error in `ThreadWithGarbageCleanup.java` (1 line).
      
      ## How was this patch tested?
      
      After passing Jenkins building, run `dev/lint-java` manually.
      ```bash
      $ dev/lint-java
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12831 from dongjoon-hyun/SPARK-15053.
      a7444570
  3. Apr 29, 2016
    • Andrew Or's avatar
      [SPARK-14988][PYTHON] SparkSession catalog and conf API · a7d0fedc
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12765 from andrewor14/python-spark-session-more.
      a7d0fedc
    • Davies Liu's avatar
      [SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver · 7feeb82c
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR copy the thrift-server from hive-service-1.2 (including  TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #12764 from davies/thrift_server.
      7feeb82c
  4. Apr 28, 2016
  5. Apr 27, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14867][BUILD] Remove `--force` option in `build/mvn` · f405de87
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Currently, `build/mvn` provides a convenient option, `--force`, in order to use the recommended version of maven without changing PATH environment variable. However, there were two problems.
      
      - `dev/lint-java` does not use the newly installed maven.
      
        ```bash
      $ ./build/mvn --force clean
      $ ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      ```
      - It's not easy to type `--force` option always.
      
      If '--force' option is used once, we had better prefer the installed maven recommended by Spark.
      This PR makes `build/mvn` check the existence of maven installed by `--force` option first.
      
      According to the comments, this PR aims to the followings:
      - Detect the maven version from `pom.xml`.
      - Install maven if there is no or old maven.
      - Remove `--force` option.
      
      ## How was this patch tested?
      
      Manual.
      
      ```bash
      $ ./build/mvn --force clean
      $ ./dev/lint-java
      Using `mvn` from path: /Users/dongjoon/spark/build/apache-maven-3.3.9/bin/mvn
      ...
      $ rm -rf ./build/apache-maven-3.3.9/
      $ ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12631 from dongjoon-hyun/SPARK-14867.
      f405de87
    • Dongjoon Hyun's avatar
      [MINOR][BUILD] Enable RAT checking on `LZ4BlockInputStream.java`. · c5443560
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Since `LZ4BlockInputStream.java` is not licensed to Apache Software Foundation (ASF), the Apache License header of that file is not monitored until now.
      This PR aims to enable RAT checking on `LZ4BlockInputStream.java` by excluding from `dev/.rat-excludes`.
      This will prevent accidental removal of Apache License header from that file.
      
      ## How was this patch tested?
      
      Pass the Jenkins tests (Specifically, RAT check stage).
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12677 from dongjoon-hyun/minor_rat_exclusion_file.
      c5443560
  6. Apr 25, 2016
    • Andrew Or's avatar
      [SPARK-14721][SQL] Remove HiveContext (part 2) · 3c5e65c3
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
      
      Note: A couple of things will break after this patch. These will be fixed separately.
      - the python HiveContext
      - all the documentation / comments referencing HiveContext
      - there will be no more HiveContext in the REPL (fixed by #12589)
      
      ## How was this patch tested?
      
      No change in functionality.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12585 from andrewor14/delete-hive-context.
      3c5e65c3
  7. Apr 24, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix lint-java errors · d34d6503
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items.
      
      - Adds a new line at the end of the files (19 files)
      - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder)
      
      ## How was this patch tested?
      
      After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.)
      ```bash
      $ dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12632 from dongjoon-hyun/SPARK-14868.
      d34d6503
  8. Apr 22, 2016
    • Yin Huai's avatar
      [SPARK-14807] Create a compatibility module · 7dde1da9
      Yin Huai authored
      ## What changes were proposed in this pull request?
      
      This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it.
      
      ## How was this patch tested?
      I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12580 from yhuai/compatibility.
      7dde1da9
  9. Apr 21, 2016
  10. Apr 17, 2016
    • Hemant Bhanawat's avatar
      [SPARK-13904][SCHEDULER] Add support for pluggable cluster manager · af1f4da7
      Hemant Bhanawat authored
      ## What changes were proposed in this pull request?
      
      This commit adds support for pluggable cluster manager. And also allows a cluster manager to clean up tasks without taking the parent process down.
      
      To plug a new external cluster manager, ExternalClusterManager trait should be implemented. It returns task scheduler and backend scheduler that will be used by SparkContext to schedule tasks. An external cluster manager is registered using the java.util.ServiceLoader mechanism (This mechanism is also being used to register data sources like parquet, json, jdbc etc.). This allows auto-loading implementations of ExternalClusterManager interface.
      
      Currently, when a driver fails, executors exit using system.exit. This does not bode well for cluster managers that would like to reuse the parent process of an executor. Hence,
      
        1. Moving system.exit to a function that can be overriden in subclasses of CoarseGrainedExecutorBackend.
        2. Added functionality of killing all the running tasks in an executor.
      
      ## How was this patch tested?
      ExternalClusterManagerSuite.scala was added to test this patch.
      
      Author: Hemant Bhanawat <hemant@snappydata.io>
      
      Closes #11723 from hbhanawat/pluggableScheduler.
      af1f4da7
  11. Apr 11, 2016
    • DB Tsai's avatar
      [SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom · efaf7d18
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies.
      
      The previous PR was failing the build because of `spark-core:test` dependency, and that was reverted. In this PR, `FunSuite` with `// scalastyle:ignore funsuite` in mllib-local test was used, similar to sketch.
      
      Thanks.
      
      ## How was this patch tested?
      
      Unit tests
      
      mengxr tedyu holdenk
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #12298 from dbtsai/dbtsai-mllib-local-build-fix.
      efaf7d18
  12. Apr 09, 2016
    • Xiangrui Meng's avatar
      415446cc
    • DB Tsai's avatar
      [SPARK-14462][ML][MLLIB] add the mllib-local build to maven pom · 1598d11b
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies. The test scope will still depend on spark-core and spark-core-test in order to use the common utilities, but the runtime will avoid any platform dependency. Couple platform independent classes will be moved to this package to demonstrate how this work.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #12241 from dbtsai/dbtsai-mllib-local-build.
      1598d11b
  13. Apr 08, 2016
    • Josh Rosen's avatar
      [SPARK-11416][BUILD] Update to Chill 0.8.0 & Kryo 3.0.3 · 906eef4c
      Josh Rosen authored
      This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / https://github.com/twitter/chill/issues/252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #12076 from JoshRosen/kryo3.
      906eef4c
    • hyukjinkwon's avatar
      [SPARK-14103][SQL] Parse unescaped quotes in CSV data source. · 725b860e
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below:
      
      ```
      "a"b,ccc,ddd
      e,f,g
      ```
      
      produces a data below:
      
      - **Before**
      
      ```bash
      ["a"b,ccc,ddd[\n]e,f,g]  <- as a value.
      ```
      
      - **After**
      
      ```bash
      ["a"b], [ccc], [ddd]
      [e], [f], [g]
      ```
      
      This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60.
      
      ## How was this patch tested?
      
      Unit tests in `CSVSuite` and `sbt/sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12226 from HyukjinKwon/SPARK-14103-quote.
      725b860e
  14. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  15. Apr 01, 2016
    • Jacek Laskowski's avatar
      [SPARK-13825][CORE] Upgrade to Scala 2.11.8 · c16a3968
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Upgrade to 2.11.8 (from the current 2.11.7)
      
      ## How was this patch tested?
      
      A manual build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #11681 from jaceklaskowski/SPARK-13825-scala-2_11_8.
      c16a3968
  16. Mar 31, 2016
    • Sital Kedia's avatar
      [SPARK-14277][CORE] Upgrade Snappy Java to 1.1.2.4 · 8de201ba
      Sital Kedia authored
      ## What changes were proposed in this pull request?
      
      Upgrade snappy to 1.1.2.4 to improve snappy read/write performance.
      
      ## How was this patch tested?
      
      Tested by running a job on the cluster and saw 7.5% cpu savings after this change.
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #12096 from sitalkedia/snappyRelease.
      8de201ba
    • Herman van Hovell's avatar
      [SPARK-14211][SQL] Remove ANTLR3 based parser · a9b93e07
      Herman van Hovell authored
      ### What changes were proposed in this pull request?
      
      This PR removes the ANTLR3 based parser, and moves the new ANTLR4 based parser into the `org.apache.spark.sql.catalyst.parser package`.
      
      ### How was this patch tested?
      
      Existing unit tests.
      
      cc rxin andrewor14 yhuai
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #12071 from hvanhovell/SPARK-14211.
      a9b93e07
  17. Mar 28, 2016
    • Herman van Hovell's avatar
      [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4 · 600c0b69
      Herman van Hovell authored
      ### What changes were proposed in this pull request?
      The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4.
      
      This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs.
      
      This PR is a work in progress, and work needs to be done in the following area's:
      
      - [x] Error handling should be improved.
      - [x] Documentation should be improved.
      - [x] Multi-Insert needs to be tested.
      - [ ] Naming and package locations.
      
      ### How was this patch tested?
      
      Catalyst and SQL unit tests.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #11557 from hvanhovell/ngParser.
      600c0b69
  18. Mar 25, 2016
    • Shixiong Zhu's avatar
      [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark · 24587ce4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR moves flume back to Spark as per the discussion in the dev mail-list.
      
      ## How was this patch tested?
      
      Existing Jenkins tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11895 from zsxwing/move-flume-back.
      24587ce4
    • Holden Karau's avatar
      [SPARK-13887][PYTHON][TRIVIAL][BUILD] Make lint-python script fail fast · 55a60576
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Change lint python script to stop on first error rather than building them up so its clearer why we failed (requested by rxin). Also while in the file, remove the commented out code.
      
      ## How was this patch tested?
      
      Manually ran lint-python script with & without pep8 errors locally and verified expected results.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #11898 from holdenk/SPARK-13887-pylint-fast-fail.
      55a60576
  19. Mar 23, 2016
    • Sun Rui's avatar
      [SPARK-14074][SPARKR] Specify commit sha1 ID when using install_github to install intr package. · 7d117501
      Sun Rui authored
      ## What changes were proposed in this pull request?
      
      In dev/lint-r.R, `install_github` makes our builds depend on a unstable source. This may cause un-expected test failures and then build break. This PR adds a specified commit sha1 ID to `install_github` to get a stable source.
      
      ## How was this patch tested?
      dev/lint-r
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #11913 from sun-rui/SPARK-14074.
      7d117501
  20. Mar 21, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule · 20fd2541
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables **LineLength** checkstyle again. To help that, this also introduces **RedundantImport** and **RedundantModifier**, too. The following is the diff on `checkstyle.xml`.
      
      ```xml
      -        <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places -->
      -        <!--
               <module name="LineLength">
                   <property name="max" value="100"/>
                   <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/>
               </module>
      -        -->
               <module name="NoLineWrap"/>
               <module name="EmptyBlock">
                   <property name="option" value="TEXT"/>
       -167,5 +164,7
               </module>
               <module name="CommentsIndentation"/>
               <module name="UnusedImports"/>
      +        <module name="RedundantImport"/>
      +        <module name="RedundantModifier"/>
      ```
      
      ## How was this patch tested?
      
      Currently, `lint-java` is disabled in Jenkins. It needs a manual test.
      After passing the Jenkins tests, `dev/lint-java` should passes locally.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11831 from dongjoon-hyun/SPARK-14011.
      20fd2541
  21. Mar 17, 2016
    • Josh Rosen's avatar
      [SPARK-13948] MiMa check should catch if the visibility changes to private · 82066a16
      Josh Rosen authored
      MiMa excludes are currently generated using both the current Spark version's classes and Spark 1.2.0's classes, but this doesn't make sense: we should only be ignoring classes which were `private` in the previous Spark version, not classes which became private in the current version.
      
      This patch updates `dev/mima` to only generate excludes with respect to the previous artifacts that MiMa checks against. It also updates `MimaBuild` so that `excludeClass` only applies directly to the class being excluded and not to its companion object (since a class and its companion object can have different accessibility).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11774 from JoshRosen/SPARK-13948.
      82066a16
  22. Mar 15, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13576][BUILD] Don't create assembly for examples. · 48978abf
      Marcelo Vanzin authored
      As part of the goal to stop creating assemblies in Spark, this change
      modifies the mvn and sbt builds to not create an assembly for examples.
      
      Instead, dependencies are copied to the build directory (under
      target/scala-xx/jars), and in the final archive, into the "examples/jars"
      directory.
      
      To avoid having to deal too much with Windows batch files, I made examples
      run through the launcher library; the spark-submit launcher now has a
      special mode to run examples, which adds all the necessary jars to the
      spark-submit command line, and replaces the bash and batch scripts that
      were used to run examples. The scripts are now just a thin wrapper around
      spark-submit; another advantage is that now all spark-submit options are
      supported.
      
      There are a few glitches; in the mvn build, a lot of duplicated dependencies
      get copied, because they are promoted to "compile" scope due to extra
      dependencies in the examples module (such as HBase). In the sbt build,
      all dependencies are copied, because there doesn't seem to be an easy
      way to filter things.
      
      I plan to clean some of this up when the rest of the tasks are finished.
      When the main assembly is replaced with jars, we can remove duplicate jars
      from the examples directory during packaging.
      
      Tested by running SparkPi in: maven build, sbt build, dist created by
      make-distribution.sh.
      
      Finally: note that running the "assembly" target in sbt doesn't build
      the examples anymore. You need to run "package" for that.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11452 from vanzin/SPARK-13576.
      48978abf
  23. Mar 14, 2016
    • Shixiong Zhu's avatar
      [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt,... · 06dec374
      Shixiong Zhu authored
      [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages
      
      ## What changes were proposed in this pull request?
      
      Currently there are a few sub-projects, each for integrating with different external sources for Streaming.  Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages
      
      - streaming-flume
      - streaming-akka
      - streaming-mqtt
      - streaming-zeromq
      - streaming-twitter
      
      They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.
      
      I have already copied these projects to https://github.com/spark-packages
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11672 from zsxwing/remove-external-pkg.
      06dec374
    • Josh Rosen's avatar
      [SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue · 07cb323e
      Josh Rosen authored
      This patch upgrades Py4J from 0.9.1 to 0.9.2 in order to include a patch which modifies Py4J to use the current thread's ContextClassLoader when performing reflection / class loading. This is necessary in order to fix [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185), a longstanding issue affecting the use of `--jars` and `--packages` in PySpark.
      
      In order to demonstrate that the fix works, I removed the workarounds which were added as part of [SPARK-6027](https://issues.apache.org/jira/browse/SPARK-6027) / #4779 and other patches.
      
      Py4J diff: https://github.com/bartdag/py4j/compare/0.9.1...0.9.2
      
      /cc zsxwing tdas davies brkyvz
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11687 from JoshRosen/py4j-0.9.2.
      07cb323e
  24. Mar 13, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13834][BUILD] Update sbt and sbt plugins for 2.x. · 473263f9
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      For 2.0.0, we had better make **sbt** and **sbt plugins** up-to-date. This PR checks the status of each plugins and bumps the followings.
      
      * sbt: 0.13.9 --> 0.13.11
      * sbteclipse-plugin: 2.2.0 --> 4.0.0
      * sbt-dependency-graph: 0.7.4 --> 0.8.2
      * sbt-mima-plugin: 0.1.6 --> 0.1.9
      * sbt-revolver: 0.7.2 --> 0.8.0
      
      All other plugins are up-to-date. (Note that `sbt-avro` seems to be change from 0.3.2 to 1.0.1, but it's not published in the repository.)
      
      During upgrade, this PR also updated the following MiMa error. Note that the related excluding filter is already registered correctly. It seems due to the change of MiMa exception result.
      ```
       // SPARK-12896 Send only accumulator updates to driver, not TaskMetrics
       ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.Accumulable.this"),
      -ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.Accumulator.this"),
      +ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.Accumulator.this"),
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins build.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11669 from dongjoon-hyun/update_mima.
      473263f9
  25. Mar 11, 2016
    • Cheng Lian's avatar
      [SPARK-13817][BUILD][SQL] Re-enable MiMA and removes object DataFrame · 6d37e1eb
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      PR #11443 temporarily disabled MiMA check, this PR re-enables it.
      
      One extra change is that `object DataFrame` is also removed. The only purpose of introducing `object DataFrame` was to use it as an internal factory for creating `Dataset[Row]`. By replacing this internal factory with `Dataset.newDataFrame`, both `DataFrame` and `DataFrame$` are entirely removed from the API, so that we can simply put a `MissingClassProblem` filter in `MimaExcludes.scala` for most DataFrame API  changes.
      
      ## How was this patch tested?
      
      Tested by MiMA check triggered by Jenkins.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #11656 from liancheng/re-enable-mima.
      6d37e1eb
    • Josh Rosen's avatar
      [SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly · 6ca990fb
      Josh Rosen authored
      This patch removes the need to build a full Spark assembly before running the `dev/mima` script.
      
      - I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
         - This required me to delete two classes full of dead code that we don't use anymore
      - `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
      - `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11178 from JoshRosen/remove-assembly-in-run-tests.
      6ca990fb
  26. Mar 10, 2016
    • Cheng Lian's avatar
      [SPARK-13244][SQL] Migrates DataFrame to Dataset · 1d542785
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR unifies DataFrame and Dataset by migrating existing DataFrame operations to Dataset and make `DataFrame` a type alias of `Dataset[Row]`.
      
      Most Scala code changes are source compatible, but Java API is broken as Java knows nothing about Scala type alias (mostly replacing `DataFrame` with `Dataset<Row>`).
      
      There are several noticeable API changes related to those returning arrays:
      
      1.  `collect`/`take`
      
          -   Old APIs in class `DataFrame`:
      
              ```scala
              def collect(): Array[Row]
              def take(n: Int): Array[Row]
              ```
      
          -   New APIs in class `Dataset[T]`:
      
              ```scala
              def collect(): Array[T]
              def take(n: Int): Array[T]
      
              def collectRows(): Array[Row]
              def takeRows(n: Int): Array[Row]
              ```
      
          Two specialized methods `collectRows` and `takeRows` are added because Java doesn't support returning generic arrays. Thus, for example, `DataFrame.collect(): Array[T]` actually returns `Object` instead of `Array<T>` from Java side.
      
          Normally, Java users may fall back to `collectAsList` and `takeAsList`.  The two new specialized versions are added to avoid performance regression in ML related code (but maybe I'm wrong and they are not necessary here).
      
      1.  `randomSplit`
      
          -   Old APIs in class `DataFrame`:
      
              ```scala
              def randomSplit(weights: Array[Double], seed: Long): Array[DataFrame]
              def randomSplit(weights: Array[Double]): Array[DataFrame]
              ```
      
          -   New APIs in class `Dataset[T]`:
      
              ```scala
              def randomSplit(weights: Array[Double], seed: Long): Array[Dataset[T]]
              def randomSplit(weights: Array[Double]): Array[Dataset[T]]
              ```
      
          Similar problem as above, but hasn't been addressed for Java API yet.  We can probably add `randomSplitAsList` to fix this one.
      
      1.  `groupBy`
      
          Some original `DataFrame.groupBy` methods have conflicting signature with original `Dataset.groupBy` methods.  To distinguish these two, typed `Dataset.groupBy` methods are renamed to `groupByKey`.
      
      Other noticeable changes:
      
      1.  Dataset always do eager analysis now
      
          We used to support disabling DataFrame eager analysis to help reporting partially analyzed malformed logical plan on analysis failure.  However, Dataset encoders requires eager analysi during Dataset construction.  To preserve the error reporting feature, `AnalysisException` now takes an extra `Option[LogicalPlan]` argument to hold the partially analyzed plan, so that we can check the plan tree when reporting test failures.  This plan is passed by `QueryExecution.assertAnalyzed`.
      
      ## How was this patch tested?
      
      Existing tests do the work.
      
      ## TODO
      
      - [ ] Fix all tests
      - [ ] Re-enable MiMA check
      - [ ] Update ScalaDoc (`since`, `group`, and example code)
      
      Author: Cheng Lian <lian@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: Cheng Lian <liancheng@users.noreply.github.com>
      
      Closes #11443 from liancheng/ds-to-df.
      1d542785
    • Sean Owen's avatar
      [SPARK-13663][CORE] Upgrade Snappy Java to 1.1.2.1 · 927e22ef
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around.
      Supersedes https://github.com/apache/spark/pull/11524
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11631 from srowen/SPARK-13663.
      927e22ef
  27. Mar 09, 2016
    • Sean Owen's avatar
      [SPARK-13595][BUILD] Move docker, extras modules into external · 256704c7
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`
      
      ## How was this patch tested?
      
      This is tested with Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11523 from srowen/SPARK-13595.
      256704c7
  28. Mar 08, 2016
    • Dongjoon Hyun's avatar
      [HOT-FIX][BUILD] Use the new location of `checkstyle-suppressions.xml` · 7771c731
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes `dev/lint-java` and `mvn checkstyle:check` failures due the recent file location change.
      The following is the error message of current master.
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Failed during checkstyle configuration: cannot initialize module SuppressionFilter - Cannot set property 'file' to 'checkstyle-suppressions.xml' in module SuppressionFilter: InvocationTargetException: Unable to find: checkstyle-suppressions.xml -> [Help 1]
      ```
      
      ## How was this patch tested?
      
      Manual. The following command should run correctly.
      ```
      ./dev/lint-java
      mvn checkstyle:check
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11567 from dongjoon-hyun/hotfix_checkstyle_suppression.
      7771c731
  29. Mar 07, 2016
    • Sean Owen's avatar
      [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
      
      I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
      
      ## How was this patch tested?
      
      `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11522 from srowen/SPARK-13596.
      0eea12a3
  30. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [MINOR] Fix typos in comments and testcase name of code · 941b270b
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes typos in comments and testcase name of code.
      
      ## How was this patch tested?
      
      manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
      941b270b
Loading