Skip to content
Snippets Groups Projects
  1. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · 6a3cbbc0
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      6a3cbbc0
  2. Oct 05, 2016
    • Shixiong Zhu's avatar
      [SPARK-17346][SQL] Add Kafka source for Structured Streaming · 9293734d
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR adds a new project ` external/kafka-0-10-sql` for Structured Streaming Kafka source.
      
      It's based on the design doc: https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing
      
      tdas did most of work and part of them was inspired by koeninger's work.
      
      ### Introduction
      
      The Kafka source is a structured streaming data source to poll data from Kafka. The schema of reading data is as follows:
      
      Column | Type
      ---- | ----
      key | binary
      value | binary
      topic | string
      partition | int
      offset | long
      timestamp | long
      timestampType | int
      
      The source can deal with deleting topics. However, the user should make sure there is no Spark job processing the data when deleting a topic.
      
      ### Configuration
      
      The user can use `DataStreamReader.option` to set the following configurations.
      
      Kafka Source's options | value | default | meaning
      ------ | ------- | ------ | -----
      startingOffset | ["earliest", "latest"] | "latest" | The start point when a query is started, either "earliest" which is from the earliest offset, or "latest" which is just from the latest offset. Note: This only applies when a new Streaming query is started, and that resuming will always pick up from where the query left off.
      failOnDataLost | [true, false] | true | Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work as you expected.
      subscribe | A comma-separated list of topics | (none) | The topic list to subscribe. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
      subscribePattern | Java regex string | (none) | The pattern used to subscribe the topic. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
      kafka.consumer.poll.timeoutMs | long | 512 | The timeout in milliseconds to poll data from Kafka in executors
      fetchOffset.numRetries | int | 3 | Number of times to retry before giving up fatch Kafka latest offsets.
      fetchOffset.retryIntervalMs | long | 10 | milliseconds to wait before retrying to fetch Kafka offsets
      
      Kafka's own configurations can be set via `DataStreamReader.option` with `kafka.` prefix, e.g, `stream.option("kafka.bootstrap.servers", "host:port")`
      
      ### Usage
      
      * Subscribe to 1 topic
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribe", "topic1")
        .load()
      ```
      
      * Subscribe to multiple topics
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribe", "topic1,topic2")
        .load()
      ```
      
      * Subscribe to a pattern
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribePattern", "topic.*")
        .load()
      ```
      
      ## How was this patch tested?
      
      The new unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Shixiong Zhu <zsxwing@gmail.com>
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15102 from zsxwing/kafka-source.
      9293734d
  3. Sep 01, 2016
    • Sean Owen's avatar
      [SPARK-17329][BUILD] Don't build PRs with -Pyarn unless YARN code changed · 536fa911
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Only build PRs with -Pyarn if YARN code was modified.
      
      ## How was this patch tested?
      
      Jenkins tests (will look to verify whether -Pyarn was included in the PR builder for this one.)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14892 from srowen/SPARK-17329.
      536fa911
  4. Jun 16, 2016
  5. Jun 14, 2016
  6. May 21, 2016
    • Reynold Xin's avatar
      [SPARK-15424][SPARK-15437][SPARK-14807][SQL] Revert Create a hivecontext-compatibility module · 45b7557e
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      I initially asked to create a hivecontext-compatibility module to put the HiveContext there. But we are so close to Spark 2.0 release and there is only a single class in it. It seems overkill to have an entire package, which makes it more inconvenient, for a single class.
      
      ## How was this patch tested?
      Tests were moved.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13207 from rxin/SPARK-15424.
      45b7557e
  7. May 11, 2016
    • cody koeninger's avatar
      [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact · 89e67d66
      cody koeninger authored
      ## What changes were proposed in this pull request?
      Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions
      
      ## How was this patch tested?
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #12946 from koeninger/SPARK-15085.
      89e67d66
  8. Apr 22, 2016
    • Yin Huai's avatar
      [SPARK-14807] Create a compatibility module · 7dde1da9
      Yin Huai authored
      ## What changes were proposed in this pull request?
      
      This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it.
      
      ## How was this patch tested?
      I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12580 from yhuai/compatibility.
      7dde1da9
  9. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  10. Mar 25, 2016
    • Shixiong Zhu's avatar
      [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark · 24587ce4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR moves flume back to Spark as per the discussion in the dev mail-list.
      
      ## How was this patch tested?
      
      Existing Jenkins tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11895 from zsxwing/move-flume-back.
      24587ce4
  11. Mar 14, 2016
    • Shixiong Zhu's avatar
      [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt,... · 06dec374
      Shixiong Zhu authored
      [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages
      
      ## What changes were proposed in this pull request?
      
      Currently there are a few sub-projects, each for integrating with different external sources for Streaming.  Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages
      
      - streaming-flume
      - streaming-akka
      - streaming-mqtt
      - streaming-zeromq
      - streaming-twitter
      
      They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.
      
      I have already copied these projects to https://github.com/spark-packages
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11672 from zsxwing/remove-external-pkg.
      06dec374
  12. Mar 11, 2016
    • Cheng Lian's avatar
      [SPARK-13817][BUILD][SQL] Re-enable MiMA and removes object DataFrame · 6d37e1eb
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      PR #11443 temporarily disabled MiMA check, this PR re-enables it.
      
      One extra change is that `object DataFrame` is also removed. The only purpose of introducing `object DataFrame` was to use it as an internal factory for creating `Dataset[Row]`. By replacing this internal factory with `Dataset.newDataFrame`, both `DataFrame` and `DataFrame$` are entirely removed from the API, so that we can simply put a `MissingClassProblem` filter in `MimaExcludes.scala` for most DataFrame API  changes.
      
      ## How was this patch tested?
      
      Tested by MiMA check triggered by Jenkins.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #11656 from liancheng/re-enable-mima.
      6d37e1eb
    • Josh Rosen's avatar
      [SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly · 6ca990fb
      Josh Rosen authored
      This patch removes the need to build a full Spark assembly before running the `dev/mima` script.
      
      - I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
         - This required me to delete two classes full of dead code that we don't use anymore
      - `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
      - `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11178 from JoshRosen/remove-assembly-in-run-tests.
      6ca990fb
  13. Mar 10, 2016
    • Cheng Lian's avatar
      [SPARK-13244][SQL] Migrates DataFrame to Dataset · 1d542785
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR unifies DataFrame and Dataset by migrating existing DataFrame operations to Dataset and make `DataFrame` a type alias of `Dataset[Row]`.
      
      Most Scala code changes are source compatible, but Java API is broken as Java knows nothing about Scala type alias (mostly replacing `DataFrame` with `Dataset<Row>`).
      
      There are several noticeable API changes related to those returning arrays:
      
      1.  `collect`/`take`
      
          -   Old APIs in class `DataFrame`:
      
              ```scala
              def collect(): Array[Row]
              def take(n: Int): Array[Row]
              ```
      
          -   New APIs in class `Dataset[T]`:
      
              ```scala
              def collect(): Array[T]
              def take(n: Int): Array[T]
      
              def collectRows(): Array[Row]
              def takeRows(n: Int): Array[Row]
              ```
      
          Two specialized methods `collectRows` and `takeRows` are added because Java doesn't support returning generic arrays. Thus, for example, `DataFrame.collect(): Array[T]` actually returns `Object` instead of `Array<T>` from Java side.
      
          Normally, Java users may fall back to `collectAsList` and `takeAsList`.  The two new specialized versions are added to avoid performance regression in ML related code (but maybe I'm wrong and they are not necessary here).
      
      1.  `randomSplit`
      
          -   Old APIs in class `DataFrame`:
      
              ```scala
              def randomSplit(weights: Array[Double], seed: Long): Array[DataFrame]
              def randomSplit(weights: Array[Double]): Array[DataFrame]
              ```
      
          -   New APIs in class `Dataset[T]`:
      
              ```scala
              def randomSplit(weights: Array[Double], seed: Long): Array[Dataset[T]]
              def randomSplit(weights: Array[Double]): Array[Dataset[T]]
              ```
      
          Similar problem as above, but hasn't been addressed for Java API yet.  We can probably add `randomSplitAsList` to fix this one.
      
      1.  `groupBy`
      
          Some original `DataFrame.groupBy` methods have conflicting signature with original `Dataset.groupBy` methods.  To distinguish these two, typed `Dataset.groupBy` methods are renamed to `groupByKey`.
      
      Other noticeable changes:
      
      1.  Dataset always do eager analysis now
      
          We used to support disabling DataFrame eager analysis to help reporting partially analyzed malformed logical plan on analysis failure.  However, Dataset encoders requires eager analysi during Dataset construction.  To preserve the error reporting feature, `AnalysisException` now takes an extra `Option[LogicalPlan]` argument to hold the partially analyzed plan, so that we can check the plan tree when reporting test failures.  This plan is passed by `QueryExecution.assertAnalyzed`.
      
      ## How was this patch tested?
      
      Existing tests do the work.
      
      ## TODO
      
      - [ ] Fix all tests
      - [ ] Re-enable MiMA check
      - [ ] Update ScalaDoc (`since`, `group`, and example code)
      
      Author: Cheng Lian <lian@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: Cheng Lian <liancheng@users.noreply.github.com>
      
      Closes #11443 from liancheng/ds-to-df.
      1d542785
  14. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [MINOR] Fix typos in comments and testcase name of code · 941b270b
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes typos in comments and testcase name of code.
      
      ## How was this patch tested?
      
      manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
      941b270b
  15. Mar 02, 2016
    • Wojciech Jurczyk's avatar
      Fix run-tests.py typos · 75e618de
      Wojciech Jurczyk authored
      ## What changes were proposed in this pull request?
      
      The PR fixes typos in an error message in dev/run-tests.py.
      
      Author: Wojciech Jurczyk <wojciech.jurczyk@codilime.com>
      
      Closes #11467 from wjur/wjur/typos_run_tests.
      75e618de
  16. Jan 27, 2016
    • Josh Rosen's avatar
      [SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_to_test() · 41f0c85f
      Josh Rosen authored
      There's a minor bug in how we handle the `root` module in the `modules_to_test()` function in `dev/run-tests.py`: since `root` now depends on `build` (since every test needs to run on any build test), we now need to check for the presence of root in `modules_to_test` instead of `changed_modules`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10933 from JoshRosen/build-module-fix.
      41f0c85f
  17. Jan 26, 2016
    • Josh Rosen's avatar
      [SPARK-8725][PROJECT-INFRA] Test modules in topologically-sorted order in dev/run-tests · ee74498d
      Josh Rosen authored
      This patch improves our `dev/run-tests` script to test modules in a topologically-sorted order based on modules' dependencies.  This will help to ensure that bugs in upstream projects are not misattributed to downstream projects because those projects' tests were the first ones to exhibit the failure
      
      Topological sorting is also useful for shortening the feedback loop when testing pull requests: if I make a change in SQL then the SQL tests should run before MLlib, not after.
      
      In addition, this patch also updates our test module definitions to split `sql` into `catalyst`, `sql`, and `hive` in order to allow more tests to be skipped when changing only `hive/` files.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10885 from JoshRosen/SPARK-8725.
      ee74498d
  18. Jan 18, 2016
  19. Jan 15, 2016
  20. Jan 14, 2016
    • Reynold Xin's avatar
      [SPARK-12829] Turn Java style checker on · 591c88c9
      Reynold Xin authored
      It was previously turned off because there was a problem with a pull request. We should turn it on now.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10763 from rxin/SPARK-12829.
      591c88c9
    • Kousuke Saruta's avatar
      [SPARK-12821][BUILD] Style checker should run when some configuration files... · bcc7373f
      Kousuke Saruta authored
      [SPARK-12821][BUILD] Style checker should run when some configuration files for style are modified but any source files are not.
      
      When running the `run-tests` script, style checkers run only when any source files are modified but they should run when configuration files related to style are modified.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10754 from sarutak/SPARK-12821.
      bcc7373f
  21. Jan 05, 2016
    • felixcheung's avatar
      [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API · cc4d5229
      felixcheung authored
      rxin davies shivaram
      Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559
      
      - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10584 from felixcheung/rremovedeprecated.
      cc4d5229
  22. Jan 04, 2016
    • Reynold Xin's avatar
      [SPARK-12600][SQL] Remove deprecated methods in Spark SQL · 77ab49b8
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10559 from rxin/remove-deprecated-sql.
      77ab49b8
    • Josh Rosen's avatar
      [SPARK-10359][PROJECT-INFRA] Use more random number in... · 9fd7a2f0
      Josh Rosen authored
      [SPARK-10359][PROJECT-INFRA] Use more random number in dev/test-dependencies.sh; fix version switching
      
      This patch aims to fix another potential source of flakiness in the `dev/test-dependencies.sh` script.
      
      pwendell's original patch and my version used `$(date +%s | tail -c6)` to generate a suffix to use when installing temporary Spark versions into the local Maven cache, but this value only changes once per second and thus is highly collision-prone when concurrent builds launch on AMPLab Jenkins. In order to reduce the potential for conflicts, this patch updates the script to call Python's random number generator instead.
      
      I also fixed a bug in how we captured the original project version; the bug was causing the exit handler code to fail.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10558 from JoshRosen/build-dep-tests-round-3.
      9fd7a2f0
    • Josh Rosen's avatar
      [SPARK-12612][PROJECT-INFRA] Add missing Hadoop profiles to dev/run-tests-*.py scripts and dev/deps · 0d165ec2
      Josh Rosen authored
      There are a couple of places in the `dev/run-tests-*.py` scripts which deal with Hadoop profiles, but the set of profiles that they handle does not include all Hadoop profiles defined in our POM. Similarly, the `hadoop-2.2` and `hadoop-2.6` profiles were missing from `dev/deps`.
      
      This patch updates these scripts to include all four Hadoop profiles defined in our POM.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10565 from JoshRosen/add-missing-hadoop-profiles-in-test-scripts.
      0d165ec2
  23. Jan 01, 2016
  24. Dec 30, 2015
    • Josh Rosen's avatar
      [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests · 27a42c71
      Josh Rosen authored
      This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath.
      
      This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs.
      
      This patch is based on pwendell's work in #8531.
      
      Closes #8531.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #10461 from JoshRosen/SPARK-10359.
      27a42c71
  25. Dec 24, 2015
  26. Dec 22, 2015
  27. Dec 19, 2015
  28. Dec 04, 2015
    • Dmitry Erastov's avatar
      [SPARK-6990][BUILD] Add Java linting script; fix minor warnings · d0d82227
      Dmitry Erastov authored
      This replaces https://github.com/apache/spark/pull/9696
      
      Invoke Checkstyle and print any errors to the console, failing the step.
      Use Google's style rules modified according to
      https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
      Some important checks are disabled (see TODOs in `checkstyle.xml`) due to
      multiple violations being present in the codebase.
      
      Suggest fixing those TODOs in a separate PR(s).
      
      More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/).
      
      Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles):
      
      > Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
      > [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1
      
      Also fix some of the minor violations that didn't require sweeping changes.
      
      Apologies for the previous botched PRs - I finally figured out the issue.
      
      cr: JoshRosen, pwendell
      
      > I state that the contribution is my original work, and I license the work to the project under the project's open source license.
      
      Author: Dmitry Erastov <derastov@gmail.com>
      
      Closes #9867 from dskrvk/master.
      d0d82227
  29. Oct 30, 2015
  30. Oct 19, 2015
    • Brennon York's avatar
      [SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python · d3180c25
      Brennon York authored
      This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes.
      
      From the original PR description (by brennonyork):
      
      Currently a few things are left out that, could and I think should, be smaller JIRA's after this.
      
      1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out.
      2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO.
      3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well.
      
      Closes #7401.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
      d3180c25
  31. Oct 16, 2015
    • Jakob Odersky's avatar
      [SPARK-11094] Strip extra strings from Java version in test runner · 08698ee1
      Jakob Odersky authored
      Removes any extra strings from the Java version, fixing subsequent integer parsing.
      This is required since some OpenJDK versions (specifically in Debian testing), append an extra "-internal" string to the version field.
      
      Author: Jakob Odersky <jodersky@gmail.com>
      
      Closes #9111 from jodersky/fixtestrunner.
      08698ee1
  32. Oct 07, 2015
  33. Sep 15, 2015
  34. Aug 30, 2015
    • Patrick Wendell's avatar
      SPARK-9545, SPARK-9547: Use Maven in PRB if title contains "[test-maven]" · 35e896a7
      Patrick Wendell authored
      This is just some small glue code to actually make use of the
      AMPLAB_JENKINS_BUILD_TOOL switch. As far as I can tell, we actually
      don't currently use the Maven support in the tool even though it exists.
      This patch switches to Maven when the PR title contains "test-maven".
      
      There are a few small other pieces of cleanup in the patch as well.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #7878 from pwendell/maven-tests.
      35e896a7
Loading