Skip to content
Snippets Groups Projects
  1. Oct 05, 2016
    • Shixiong Zhu's avatar
      [SPARK-17346][SQL] Add Kafka source for Structured Streaming · 9293734d
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR adds a new project ` external/kafka-0-10-sql` for Structured Streaming Kafka source.
      
      It's based on the design doc: https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing
      
      tdas did most of work and part of them was inspired by koeninger's work.
      
      ### Introduction
      
      The Kafka source is a structured streaming data source to poll data from Kafka. The schema of reading data is as follows:
      
      Column | Type
      ---- | ----
      key | binary
      value | binary
      topic | string
      partition | int
      offset | long
      timestamp | long
      timestampType | int
      
      The source can deal with deleting topics. However, the user should make sure there is no Spark job processing the data when deleting a topic.
      
      ### Configuration
      
      The user can use `DataStreamReader.option` to set the following configurations.
      
      Kafka Source's options | value | default | meaning
      ------ | ------- | ------ | -----
      startingOffset | ["earliest", "latest"] | "latest" | The start point when a query is started, either "earliest" which is from the earliest offset, or "latest" which is just from the latest offset. Note: This only applies when a new Streaming query is started, and that resuming will always pick up from where the query left off.
      failOnDataLost | [true, false] | true | Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work as you expected.
      subscribe | A comma-separated list of topics | (none) | The topic list to subscribe. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
      subscribePattern | Java regex string | (none) | The pattern used to subscribe the topic. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
      kafka.consumer.poll.timeoutMs | long | 512 | The timeout in milliseconds to poll data from Kafka in executors
      fetchOffset.numRetries | int | 3 | Number of times to retry before giving up fatch Kafka latest offsets.
      fetchOffset.retryIntervalMs | long | 10 | milliseconds to wait before retrying to fetch Kafka offsets
      
      Kafka's own configurations can be set via `DataStreamReader.option` with `kafka.` prefix, e.g, `stream.option("kafka.bootstrap.servers", "host:port")`
      
      ### Usage
      
      * Subscribe to 1 topic
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribe", "topic1")
        .load()
      ```
      
      * Subscribe to multiple topics
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribe", "topic1,topic2")
        .load()
      ```
      
      * Subscribe to a pattern
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribePattern", "topic.*")
        .load()
      ```
      
      ## How was this patch tested?
      
      The new unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Shixiong Zhu <zsxwing@gmail.com>
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15102 from zsxwing/kafka-source.
      9293734d
  2. Sep 23, 2016
    • Shivaram Venkataraman's avatar
      [SPARK-17651][SPARKR] Set R package version number along with mvn · 7c382524
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      This PR sets the R package version while tagging releases. Note that since R doesn't accept `-SNAPSHOT` in version number field, we remove that while setting the next version
      
      ## How was this patch tested?
      
      Tested manually by running locally
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #15223 from shivaram/sparkr-version-change.
      7c382524
  3. Sep 21, 2016
  4. Sep 16, 2016
    • Reynold Xin's avatar
      [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 · dca771be
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, which was recently released and contained a number of bug fixes.
      
      ## How was this patch tested?
      The change should be covered by existing tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #15115 from rxin/SPARK-17558.
      dca771be
  5. Sep 15, 2016
    • Adam Roberts's avatar
      [SPARK-17379][BUILD] Upgrade netty-all to 4.0.41 final for bug fixes · 0ad8eeb4
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      Upgrade netty-all to latest in the 4.0.x line which is 4.0.41, mentions several bug fixes and performance improvements we may find useful, see netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html. Initially tried to use 4.1.5 but noticed it's not backwards compatible.
      
      ## How was this patch tested?
      Existing unit tests against branch-1.6 and branch-2.0 using IBM Java 8 on Intel, Power and Z architectures
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14961 from a-roberts/netty.
      0ad8eeb4
  6. Sep 08, 2016
  7. Sep 06, 2016
    • Adam Roberts's avatar
      [SPARK-17378][BUILD] Upgrade snappy-java to 1.1.2.6 · 6c08dbf6
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"
      
      ## How was this patch tested?
      Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14958 from a-roberts/master.
      6c08dbf6
  8. Sep 01, 2016
    • Sean Owen's avatar
      [SPARK-17329][BUILD] Don't build PRs with -Pyarn unless YARN code changed · 536fa911
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Only build PRs with -Pyarn if YARN code was modified.
      
      ## How was this patch tested?
      
      Jenkins tests (will look to verify whether -Pyarn was included in the PR builder for this one.)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14892 from srowen/SPARK-17329.
      536fa911
  9. Aug 31, 2016
  10. Aug 30, 2016
    • Ferdinand Xu's avatar
      [SPARK-5682][CORE] Add encrypted shuffle in spark · 4b4e329e
      Ferdinand Xu authored
      This patch is using Apache Commons Crypto library to enable shuffle encryption support.
      
      Author: Ferdinand Xu <cheng.a.xu@intel.com>
      Author: kellyzly <kellyzly@126.com>
      
      Closes #8880 from winningsix/SPARK-10771.
      4b4e329e
    • frreiss's avatar
      [SPARK-17303] Added spark-warehouse to dev/.rat-excludes · 8fb445d9
      frreiss authored
      ## What changes were proposed in this pull request?
      
      Excludes the `spark-warehouse` directory from the Apache RAT checks that src/run-tests performs. `spark-warehouse` is created by some of the Spark SQL tests, as well as by `bin/spark-sql`.
      
      ## How was this patch tested?
      
      Ran src/run-tests twice. The second time, the script failed because the first iteration
      Made the change in this PR.
      Ran src/run-tests a third time; RAT checks succeeded.
      
      Author: frreiss <frreiss@us.ibm.com>
      
      Closes #14870 from frreiss/fred-17303.
      8fb445d9
  11. Aug 26, 2016
    • Michael Gummelt's avatar
      [SPARK-16967] move mesos to module · 8e5475be
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Move Mesos code into a mvn module
      
      ## How was this patch tested?
      
      unit tests
      manually submitting a client mode and cluster mode job
      spark/mesos integration test suite
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14637 from mgummelt/mesos-module.
      8e5475be
  12. Aug 24, 2016
    • Sean Owen's avatar
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same... · 0b3a4be9
      Sean Owen authored
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
      
      ## What changes were proposed in this pull request?
      
      Update to py4j 0.10.3 to enable JAVA_HOME support
      
      ## How was this patch tested?
      
      Pyspark tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14748 from srowen/SPARK-16781.
      0b3a4be9
  13. Aug 10, 2016
    • jerryshao's avatar
      [SPARK-14743][YARN] Add a configurable credential manager for Spark running on YARN · ab648c00
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Add a configurable token manager for Spark on running on yarn.
      
      ### Current Problems ###
      
      1. Supported token provider is hard-coded, currently only hdfs, hbase and hive are supported and it is impossible for user to add new token provider without code changes.
      2. Also this problem exits in timely token renewer and updater.
      
      ### Changes In This Proposal ###
      
      In this proposal, to address the problems mentioned above and make the current code more cleaner and easier to understand, mainly has 3 changes:
      
      1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable` interface for token provider. Each service wants to communicate with Spark through token way needs to implement this interface.
      2. Provide a `ConfigurableTokenManager` to manage all the register token providers, also token renewer and updater. Also this class offers the API for other modules to obtain tokens, get renewal interval and so on.
      3. Implement 3 built-in token providers `HDFSTokenProvider`, `HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as supported today. Whether to load in these built-in token providers is controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by default for all the built-in token providers are loaded.
      
      ### Behavior Changes ###
      
      For the end user there's no behavior change, we still use the same configuration `spark.yarn.security.tokens.${service}.enabled` to decide which token provider is enabled (hbase or hive).
      
      For user implemented token provider (assume the name of token provider is "test") needs to add into this class should have two configurations:
      
      1. `spark.yarn.security.tokens.test.enabled` to true
      2. `spark.yarn.security.tokens.test.class` to the full qualified class name.
      
      So we still keep the same semantics as current code while add one new configuration.
      
      ### Current Status ###
      
      - [x] token provider interface and management framework.
      - [x] implement built-in token providers (hdfs, hbase, hive).
      - [x] Coverage of unit test.
      - [x] Integrated test with security cluster.
      
      ## How was this patch tested?
      
      Unit test and integrated test.
      
      Please suggest and review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #14065 from jerryshao/SPARK-16342.
      ab648c00
  14. Aug 03, 2016
    • Stefan Schulze's avatar
      [SPARK-16770][BUILD] Fix JLine dependency management and version (Sca… · 4775eb41
      Stefan Schulze authored
      ## What changes were proposed in this pull request?
      As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's  dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages:
      - Exclude transitive dependency 'jline:jline' from hive-exec module
      - Remove global properties 'jline.version' and 'jline.groupId'
      - Add both properties and dependency to 'scala-2.11' profile
      - Add explicit dependency on 'jline:jline' to  module 'spark-repl'
      
      ## How was this patch tested?
      - Running mvn dependency:tree and checking for correct Jline version 2.12.1
      - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball
      
      Author: Stefan Schulze <stefan.schulze@pentasys.de>
      
      Closes #14429 from stsc-pentasys/SPARK-16770.
      4775eb41
  15. Jul 29, 2016
    • Michael Gummelt's avatar
      [SPARK-16637] Unified containerizer · 266b92fa
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      New config var: spark.mesos.docker.containerizer={"mesos","docker" (default)}
      
      This adds support for running docker containers via the Mesos unified containerizer: http://mesos.apache.org/documentation/latest/container-image/
      
      The benefit is losing the dependency on `dockerd`, and all the costs which it incurs.
      
      I've also updated the supported Mesos version to 0.28.2 for support of the required protobufs.
      
      This is blocked on: https://github.com/apache/spark/pull/14167
      
      ## How was this patch tested?
      
      - manually testing jobs submitted with both "mesos" and "docker" settings for the new config var.
      - spark/mesos integration test suite
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14275 from mgummelt/unified-containerizer.
      266b92fa
    • Adam Roberts's avatar
      [SPARK-16751] Upgrade derby to 10.12.1.1 · 04a2c072
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Version of derby upgraded based on important security info at VersionEye. Test scope added so we don't include it in our final package anyway. NB: I think this should be backported to all previous releases as it is a security problem https://www.versioneye.com/java/org.apache.derby:derby/10.11.1.1
      
      The CVE number is 2015-1832. I also suggest we add a SECURITY tag for JIRAs
      
      ## How was this patch tested?
      Existing tests with the change making sure that we see no new failures. I checked derby 10.12.x and not derby 10.11.x is downloaded to our ~/.m2 folder.
      
      I then used dev/make-distribution.sh and checked the dist/jars folder for Spark 2.0: no derby jar is present.
      
      I don't know if this would also remove it from the assembly jar in our 1.x branches.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #14379 from a-roberts/patch-4.
      04a2c072
  16. Jul 26, 2016
    • Philipp Hoffmann's avatar
      [SPARK-15271][MESOS] Allow force pulling executor docker images · 0869b3a5
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Mesos agents by default will not pull docker images which are cached
      locally already. In order to run Spark executors from mutable tags like
      `:latest` this commit introduces a Spark setting
      (`spark.mesos.executor.docker.forcePullImage`). Setting this flag to
      true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
      implementation and Mesos' default
      behaviour).
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #14348 from philipphoffmann/force-pull-image.
      0869b3a5
  17. Jul 25, 2016
    • Josh Rosen's avatar
      fc17121d
    • Philipp Hoffmann's avatar
      [SPARK-15271][MESOS] Allow force pulling executor docker images · 978cd5f1
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      Mesos agents by default will not pull docker images which are cached
      locally already. In order to run Spark executors from mutable tags like
      `:latest` this commit introduces a Spark setting
      `spark.mesos.executor.docker.forcePullImage`. Setting this flag to
      true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
      implementation and Mesos' default
      behaviour).
      
      ## How was this patch tested?
      
      I ran a sample application including this change on a Mesos cluster and verified the correct behaviour for both, with and without, force pulling the executor image. As expected the image is being force pulled if the flag is set.
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #13051 from philipphoffmann/force-pull-image.
      978cd5f1
    • Reynold Xin's avatar
      [SPARK-16685] Remove audit-release scripts. · dd784a88
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch removes dev/audit-release. It was initially created to do basic release auditing. They have been unused by for the last one year+.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14342 from rxin/SPARK-16685.
      dd784a88
  18. Jul 19, 2016
    • Yanbo Liang's avatar
      [SPARK-16494][ML] Upgrade breeze version to 0.12 · 67089149
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      breeze 0.12 has been released for more than half a year, and it brings lots of new features, performance improvement and bug fixes.
      One of the biggest features is ```LBFGS-B``` which is an implementation of ```LBFGS``` with box constraints and much faster for some special case.
      We would like to implement Huber loss function for ```LinearRegression``` ([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it requires ```LBFGS-B``` as the optimization solver. So we should bump up the dependent breeze version to 0.12.
      For more features, improvements and bug fixes of breeze 0.12, you can refer the following link:
      https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c
      
      ## How was this patch tested?
      No new tests, should pass the existing ones.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14150 from yanboliang/spark-16494.
      67089149
  19. Jul 16, 2016
  20. Jul 10, 2016
  21. Jul 08, 2016
  22. Jun 16, 2016
  23. Jun 14, 2016
    • Shixiong Zhu's avatar
      [SPARK-15935][PYSPARK] Fix a wrong format tag in the error message · 0ee9fd9e
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      A follow up PR for #13655 to fix a wrong format tag.
      
      ## How was this patch tested?
      
      Jenkins unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13665 from zsxwing/fix.
      0ee9fd9e
    • Adam Roberts's avatar
      [SPARK-15821][DOCS] Include parallel build info · a431e3f1
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      We should mention that users can build Spark using multiple threads to decrease build times; either here or in "Building Spark"
      
      ## How was this patch tested?
      
      Built on machines with between one core to 192 cores using mvn -T 1C and observed faster build times with no loss in stability
      
      In response to the question here https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest this option as we know it works for Spark and can result in faster builds
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13562 from a-roberts/patch-3.
      a431e3f1
    • Shixiong Zhu's avatar
      [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests · 96c3500c
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR just enables tests for sql/streaming.py and also fixes the failures.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13655 from zsxwing/python-streaming-test.
      96c3500c
  24. Jun 09, 2016
    • Adam Roberts's avatar
      [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2 · 147c0208
      Adam Roberts authored
      ## What changes were proposed in this pull request?
      
      Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Existing tests
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use
      
      https://hadoop.apache.org/docs/r2.7.0/ states
      
      "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
      This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."
      
      Hadoop 2.7.1 release notes:
      "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."
      
      And then Hadoop 2.7.2 release notes:
      "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."
      
      I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #13556 from a-roberts/patch-2.
      147c0208
    • Josh Rosen's avatar
      [SPARK-12712] Fix failure in ./dev/test-dependencies when run against empty .m2 cache · 921fa40b
      Josh Rosen authored
      This patch fixes a bug in `./dev/test-dependencies.sh` which caused spurious failures when the script was run on a machine with an empty `.m2` cache. The problem was that extra log output from the dependency download was conflicting with the grep / regex used to identify the classpath in the Maven output. This patch fixes this issue by adjusting the regex pattern.
      
      Tested manually with the following reproduction of the bug:
      
      ```
      rm -rf ~/.m2/repository/org/apache/commons/
      ./dev/test-dependencies.sh
      ```
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #13568 from JoshRosen/SPARK-12712.
      921fa40b
  25. Jun 08, 2016
    • Sandeep Singh's avatar
      [MINOR] Fix Java Lint errors introduced by #13286 and #13280 · f958c1c3
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      
      revived #13464
      
      Fix Java Lint errors introduced by #13286 and #13280
      Before:
      ```
      Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
      Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
      ```
      
      ## How was this patch tested?
      ran `dev/lint-java` locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #13559 from techaddict/minor-3.
      f958c1c3
  26. May 31, 2016
  27. May 27, 2016
    • Ryan Blue's avatar
      [SPARK-9876][SQL] Update Parquet to 1.8.1. · 776d183c
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      This includes minimal changes to get Spark using the current release of Parquet, 1.8.1.
      
      ## How was this patch tested?
      
      This uses the existing Parquet tests.
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #13280 from rdblue/SPARK-9876-update-parquet.
      776d183c
  28. May 26, 2016
    • Villu Ruusmann's avatar
      [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15 · 6d506c9a
      Villu Ruusmann authored
      ## What changes were proposed in this pull request?
      
      See https://issues.apache.org/jira/browse/SPARK-15523
      
      This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes.
      
      ## How was this patch tested?
      
      1. Executed `mvn clean package` in `mllib` directory
      2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory.
      
      Author: Villu Ruusmann <villu.ruusmann@gmail.com>
      
      Closes #13297 from vruusmann/update-jpmml.
      6d506c9a
  29. May 25, 2016
  30. May 24, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work · c24b6b67
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF".  Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed.
      
      ## How was this patch tested?
      
      `JsonParsingOptionsSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #9759 from viirya/fix-json-nonnumric.
      c24b6b67
  31. May 21, 2016
    • Reynold Xin's avatar
      [SPARK-15424][SPARK-15437][SPARK-14807][SQL] Revert Create a hivecontext-compatibility module · 45b7557e
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      I initially asked to create a hivecontext-compatibility module to put the HiveContext there. But we are so close to Spark 2.0 release and there is only a single class in it. It seems overkill to have an entire package, which makes it more inconvenient, for a single class.
      
      ## How was this patch tested?
      Tests were moved.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13207 from rxin/SPARK-15424.
      45b7557e
  32. May 20, 2016
    • Sameer Agarwal's avatar
      [SPARK-15078] [SQL] Add all TPCDS 1.4 benchmark queries for SparkSQL · a78d6ce3
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      Now that SparkSQL supports all TPC-DS queries, this patch adds all 99 benchmark queries inside SparkSQL.
      
      ## How was this patch tested?
      
      Benchmark only
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #13188 from sameeragarwal/tpcds-all.
      a78d6ce3
Loading