Skip to content
Snippets Groups Projects
  1. Mar 11, 2016
    • Josh Rosen's avatar
      [SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly · 6ca990fb
      Josh Rosen authored
      This patch removes the need to build a full Spark assembly before running the `dev/mima` script.
      
      - I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
         - This required me to delete two classes full of dead code that we don't use anymore
      - `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
      - `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11178 from JoshRosen/remove-assembly-in-run-tests.
      6ca990fb
  2. Mar 10, 2016
    • Cheng Lian's avatar
      [SPARK-13244][SQL] Migrates DataFrame to Dataset · 1d542785
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR unifies DataFrame and Dataset by migrating existing DataFrame operations to Dataset and make `DataFrame` a type alias of `Dataset[Row]`.
      
      Most Scala code changes are source compatible, but Java API is broken as Java knows nothing about Scala type alias (mostly replacing `DataFrame` with `Dataset<Row>`).
      
      There are several noticeable API changes related to those returning arrays:
      
      1.  `collect`/`take`
      
          -   Old APIs in class `DataFrame`:
      
              ```scala
              def collect(): Array[Row]
              def take(n: Int): Array[Row]
              ```
      
          -   New APIs in class `Dataset[T]`:
      
              ```scala
              def collect(): Array[T]
              def take(n: Int): Array[T]
      
              def collectRows(): Array[Row]
              def takeRows(n: Int): Array[Row]
              ```
      
          Two specialized methods `collectRows` and `takeRows` are added because Java doesn't support returning generic arrays. Thus, for example, `DataFrame.collect(): Array[T]` actually returns `Object` instead of `Array<T>` from Java side.
      
          Normally, Java users may fall back to `collectAsList` and `takeAsList`.  The two new specialized versions are added to avoid performance regression in ML related code (but maybe I'm wrong and they are not necessary here).
      
      1.  `randomSplit`
      
          -   Old APIs in class `DataFrame`:
      
              ```scala
              def randomSplit(weights: Array[Double], seed: Long): Array[DataFrame]
              def randomSplit(weights: Array[Double]): Array[DataFrame]
              ```
      
          -   New APIs in class `Dataset[T]`:
      
              ```scala
              def randomSplit(weights: Array[Double], seed: Long): Array[Dataset[T]]
              def randomSplit(weights: Array[Double]): Array[Dataset[T]]
              ```
      
          Similar problem as above, but hasn't been addressed for Java API yet.  We can probably add `randomSplitAsList` to fix this one.
      
      1.  `groupBy`
      
          Some original `DataFrame.groupBy` methods have conflicting signature with original `Dataset.groupBy` methods.  To distinguish these two, typed `Dataset.groupBy` methods are renamed to `groupByKey`.
      
      Other noticeable changes:
      
      1.  Dataset always do eager analysis now
      
          We used to support disabling DataFrame eager analysis to help reporting partially analyzed malformed logical plan on analysis failure.  However, Dataset encoders requires eager analysi during Dataset construction.  To preserve the error reporting feature, `AnalysisException` now takes an extra `Option[LogicalPlan]` argument to hold the partially analyzed plan, so that we can check the plan tree when reporting test failures.  This plan is passed by `QueryExecution.assertAnalyzed`.
      
      ## How was this patch tested?
      
      Existing tests do the work.
      
      ## TODO
      
      - [ ] Fix all tests
      - [ ] Re-enable MiMA check
      - [ ] Update ScalaDoc (`since`, `group`, and example code)
      
      Author: Cheng Lian <lian@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: Cheng Lian <liancheng@users.noreply.github.com>
      
      Closes #11443 from liancheng/ds-to-df.
      1d542785
    • Sean Owen's avatar
      [SPARK-13663][CORE] Upgrade Snappy Java to 1.1.2.1 · 927e22ef
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around.
      Supersedes https://github.com/apache/spark/pull/11524
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11631 from srowen/SPARK-13663.
      927e22ef
  3. Mar 09, 2016
    • Sean Owen's avatar
      [SPARK-13595][BUILD] Move docker, extras modules into external · 256704c7
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`
      
      ## How was this patch tested?
      
      This is tested with Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11523 from srowen/SPARK-13595.
      256704c7
  4. Mar 08, 2016
    • Dongjoon Hyun's avatar
      [HOT-FIX][BUILD] Use the new location of `checkstyle-suppressions.xml` · 7771c731
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes `dev/lint-java` and `mvn checkstyle:check` failures due the recent file location change.
      The following is the error message of current master.
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Failed during checkstyle configuration: cannot initialize module SuppressionFilter - Cannot set property 'file' to 'checkstyle-suppressions.xml' in module SuppressionFilter: InvocationTargetException: Unable to find: checkstyle-suppressions.xml -> [Help 1]
      ```
      
      ## How was this patch tested?
      
      Manual. The following command should run correctly.
      ```
      ./dev/lint-java
      mvn checkstyle:check
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11567 from dongjoon-hyun/hotfix_checkstyle_suppression.
      7771c731
  5. Mar 07, 2016
    • Sean Owen's avatar
      [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
      
      I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
      
      ## How was this patch tested?
      
      `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11522 from srowen/SPARK-13596.
      0eea12a3
  6. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [MINOR] Fix typos in comments and testcase name of code · 941b270b
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes typos in comments and testcase name of code.
      
      ## How was this patch tested?
      
      manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
      941b270b
    • Steve Loughran's avatar
      [SPARK-13599][BUILD] remove transitive groovy dependencies from Hive · 9a48c656
      Steve Loughran authored
      ## What changes were proposed in this pull request?
      
      Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.
      
      This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.
      
      ## How was this patch tested?
      
      1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
      1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
      1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
      1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
      1. Patch applied
      1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
      1. Examined created spark-assembly, verified no org.codehaus packages
      1. Verified that the maven dependency tree no longer references groovy
      
      Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
      9a48c656
  7. Mar 02, 2016
    • Wojciech Jurczyk's avatar
      Fix run-tests.py typos · 75e618de
      Wojciech Jurczyk authored
      ## What changes were proposed in this pull request?
      
      The PR fixes typos in an error message in dev/run-tests.py.
      
      Author: Wojciech Jurczyk <wojciech.jurczyk@codilime.com>
      
      Closes #11467 from wjur/wjur/typos_run_tests.
      75e618de
  8. Mar 01, 2016
    • jerryshao's avatar
      [BUILD][MINOR] Fix SBT build error with network-yarn module · b4d096de
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      ```
      error] Expected ID character
      [error] Not a valid command: common (similar: completions)
      [error] Expected project ID
      [error] Expected configuration
      [error] Expected ':' (if selecting a configuration)
      [error] Expected key
      [error] Not a valid key: common (similar: commands)
      [error] common/network-yarn/test
      ```
      
      `common/network-yarn` is not a valid sbt project, we should change to `network-yarn`.
      
      ## How was this patch tested?
      
      Locally run the the unit-test.
      
      CC rxin , we should either change here, or change the sbt project name.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #11456 from jerryshao/build-fix.
      b4d096de
  9. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  10. Feb 27, 2016
  11. Feb 26, 2016
    • Josh Rosen's avatar
      [SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifacts to home.apache.org · f77dc4e1
      Josh Rosen authored
      Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.
      f77dc4e1
  12. Feb 17, 2016
  13. Feb 12, 2016
    • Holden Karau's avatar
      [SPARK-13154][PYTHON] Add linting for pydocs · 64515e5f
      Holden Karau authored
      We should have lint rules using sphinx to automatically catch the pydoc issues that are sometimes introduced.
      
      Right now ./dev/lint-python will skip building the docs if sphinx isn't present - but it might make sense to fail hard - just a matter of if we want to insist all PySpark developers have sphinx present.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #11109 from holdenk/SPARK-13154-add-pydoc-lint-for-docs.
      64515e5f
  14. Feb 09, 2016
  15. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  16. Jan 27, 2016
    • Josh Rosen's avatar
      [SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_to_test() · 41f0c85f
      Josh Rosen authored
      There's a minor bug in how we handle the `root` module in the `modules_to_test()` function in `dev/run-tests.py`: since `root` now depends on `build` (since every test needs to run on any build test), we now need to check for the presence of root in `modules_to_test` instead of `changed_modules`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10933 from JoshRosen/build-module-fix.
      41f0c85f
  17. Jan 26, 2016
    • Josh Rosen's avatar
      [SPARK-8725][PROJECT-INFRA] Test modules in topologically-sorted order in dev/run-tests · ee74498d
      Josh Rosen authored
      This patch improves our `dev/run-tests` script to test modules in a topologically-sorted order based on modules' dependencies.  This will help to ensure that bugs in upstream projects are not misattributed to downstream projects because those projects' tests were the first ones to exhibit the failure
      
      Topological sorting is also useful for shortening the feedback loop when testing pull requests: if I make a change in SQL then the SQL tests should run before MLlib, not after.
      
      In addition, this patch also updates our test module definitions to split `sql` into `catalyst`, `sql`, and `hive` in order to allow more tests to be skipped when changing only `hive/` files.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10885 from JoshRosen/SPARK-8725.
      ee74498d
  18. Jan 24, 2016
    • Holden Karau's avatar
      [SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python tools · a8340013
      Holden Karau authored
      Minor since so few people use them, but it would probably be good to have a requirements file for our python release tools for easier setup (also version pinning).
      
      cc JoshRosen who looked at the original JIRA.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #10871 from holdenk/SPARK-10498-add-requirements-file-for-dev-python-tools.
      a8340013
  19. Jan 23, 2016
  20. Jan 22, 2016
    • Shixiong Zhu's avatar
      [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming · bc1babd6
      Shixiong Zhu authored
      - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
      - Remove HttpFileServer
      - Remove Akka configs from SparkConf and SSLOptions
      - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
      - Update comments and docs
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10854 from zsxwing/remove-akka.
      bc1babd6
  21. Jan 20, 2016
    • Shixiong Zhu's avatar
      [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60
      Shixiong Zhu authored
      Include the following changes:
      
      1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
      2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
      3. Update the ActorWordCount example and add the JavaActorWordCount example
      4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10744 from zsxwing/streaming-akka-2.
      b7d74a60
  22. Jan 18, 2016
  23. Jan 15, 2016
    • Josh Rosen's avatar
      [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile · 8dbbf3e7
      Josh Rosen authored
      This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.
      
      /cc rxin srowen
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
      8dbbf3e7
    • Reynold Xin's avatar
      [SPARK-12667] Remove block manager's internal "external block store" API · ad1503f9
      Reynold Xin authored
      This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems.
      
      There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10752 from rxin/remove-offheap.
      ad1503f9
    • Hossein's avatar
      [SPARK-12833][SQL] Initial import of spark-csv · 5f83c699
      Hossein authored
      CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Having to rely on a 3rd party component for this leads to poor user experience for new users. This PR merges the popular spark-csv data source package (https://github.com/databricks/spark-csv) with SparkSQL.
      
      This is a first PR to bring the functionality to spark 2.0 master. We will complete items outlines in the design document (see JIRA attachment) in follow up pull requests.
      
      Author: Hossein <hossein@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10766 from rxin/csv.
      5f83c699
  24. Jan 14, 2016
    • Reynold Xin's avatar
      [SPARK-12829] Turn Java style checker on · 591c88c9
      Reynold Xin authored
      It was previously turned off because there was a problem with a pull request. We should turn it on now.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10763 from rxin/SPARK-12829.
      591c88c9
    • Kousuke Saruta's avatar
      [SPARK-12821][BUILD] Style checker should run when some configuration files... · bcc7373f
      Kousuke Saruta authored
      [SPARK-12821][BUILD] Style checker should run when some configuration files for style are modified but any source files are not.
      
      When running the `run-tests` script, style checkers run only when any source files are modified but they should run when configuration files related to style are modified.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10754 from sarutak/SPARK-12821.
      bcc7373f
  25. Jan 13, 2016
    • Josh Rosen's avatar
      [SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous branch when possible · 97e0c7c5
      Josh Rosen authored
      This patch modifies our PR merge script to reset back to a named branch when restoring the original checkout upon exit. When the committer is originally checked out to a detached head, then they will be restored back to that same ref (the same as today's behavior).
      
      This is a slightly updated version of #7569, with an extra fix to handle the detached head corner-case.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10709 from JoshRosen/SPARK-9383.
      97e0c7c5
  26. Jan 12, 2016
  27. Jan 11, 2016
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests · a4499145
      Josh Rosen authored
      This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.
      
      First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.
      
      I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.
      
      /cc zsxwing
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10704 from JoshRosen/fix-build-test-problems.
      a4499145
    • BrianLondon's avatar
      [SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4
      BrianLondon authored
      The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.
      
      Author: BrianLondon <brian@seatgeek.com>
      
      Closes #10256 from BrianLondon/master.
      8fe928b4
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f
      Josh Rosen authored
      This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.
      
      While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10693 from JoshRosen/netty-hotfix.
      f13c7f8f
  28. Jan 10, 2016
    • Josh Rosen's avatar
      [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b
      Josh Rosen authored
      Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).
      
      This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.
      
      /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.
      
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10672 from JoshRosen/enforce-netty-exclusions.
      3ab0138b
  29. Jan 09, 2016
  30. Jan 06, 2016
    • Herman van Hovell's avatar
      [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst · ea489f14
      Herman van Hovell authored
      This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made:
      
      The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling.
      
      The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project:
      - ```CatalystQl```: This implements Query and Expression parsing functionality.
      - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe.
      - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive.
      
      cc rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10583 from hvanhovell/SPARK-12575.
      ea489f14
  31. Jan 05, 2016
    • felixcheung's avatar
      [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API · cc4d5229
      felixcheung authored
      rxin davies shivaram
      Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559
      
      - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10584 from felixcheung/rremovedeprecated.
      cc4d5229
  32. Jan 04, 2016
    • Reynold Xin's avatar
      [SPARK-12600][SQL] Remove deprecated methods in Spark SQL · 77ab49b8
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10559 from rxin/remove-deprecated-sql.
      77ab49b8
    • Josh Rosen's avatar
      [SPARK-10359][PROJECT-INFRA] Use more random number in... · 9fd7a2f0
      Josh Rosen authored
      [SPARK-10359][PROJECT-INFRA] Use more random number in dev/test-dependencies.sh; fix version switching
      
      This patch aims to fix another potential source of flakiness in the `dev/test-dependencies.sh` script.
      
      pwendell's original patch and my version used `$(date +%s | tail -c6)` to generate a suffix to use when installing temporary Spark versions into the local Maven cache, but this value only changes once per second and thus is highly collision-prone when concurrent builds launch on AMPLab Jenkins. In order to reduce the potential for conflicts, this patch updates the script to call Python's random number generator instead.
      
      I also fixed a bug in how we captured the original project version; the bug was causing the exit handler code to fail.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10558 from JoshRosen/build-dep-tests-round-3.
      9fd7a2f0
Loading