Skip to content
Snippets Groups Projects
  1. Jul 24, 2015
  2. Jul 22, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-9121] [SPARKR] Get rid of the warnings about `no visible global... · 63f4bcc7
      Yu ISHIKAWA authored
      [SPARK-9121] [SPARKR] Get rid of the warnings about `no visible global function definition` in SparkR
      
      [[SPARK-9121] Get rid of the warnings about `no visible global function definition` in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9121)
      
      ## The Result of `dev/lint-r`
      [The result of lint-r for SPARK-9121 at the revision:1ddd0f2f when I have sent a PR](https://gist.github.com/yu-iskw/6f55953425901725edf6)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7567 from yu-iskw/SPARK-9121 and squashes the following commits:
      
      c8cfd63 [Yu ISHIKAWA] Fix the typo
      b1f19ed [Yu ISHIKAWA] Add a validate statement for local SparkR
      1a03987 [Yu ISHIKAWA] Load the `testthat` package in `dev/lint-r.R`, instead of using the full path of function.
      3a5e0ab [Yu ISHIKAWA] [SPARK-9121][SparkR] Get rid of the warnings about `no visible global function definition` in SparkR
      63f4bcc7
  3. Jul 21, 2015
    • Michael Allman's avatar
      [SPARK-8401] [BUILD] Scala version switching build enhancements · f5b6dc5e
      Michael Allman authored
      These commits address a few minor issues in the Scala cross-version support in the build:
      
        1. Correct two missing `${scala.binary.version}` pom file substitutions.
        2. Don't update `scala.binary.version` in parent POM. This property is set through profiles.
        3. Update the source of the generated scaladocs in `docs/_plugins/copy_api_dirs.rb`.
        4. Factor common code out of `dev/change-version-to-*.sh` and add some validation. We also test `sed` to see if it's GNU sed and try `gsed` as an alternative if not. This prevents the script from running with a non-GNU sed.
      
      This is my original work and I license this work to the Spark project under the Apache License.
      
      Author: Michael Allman <michael@videoamp.com>
      
      Closes #6832 from mallman/scala-versions and squashes the following commits:
      
      cde2f17 [Michael Allman] Delete dev/change-version-to-*.sh, replacing them with single dev/change-scala-version.sh script that takes a version as argument
      02296f2 [Michael Allman] Make the scala version change scripts cross-platform by restricting ourselves to POSIX sed syntax instead of looking for GNU sed
      ad9b40a [Michael Allman] Factor change-scala-version.sh out of change-version-to-*.sh, adding command line argument validation and testing for GNU sed
      bdd20bf [Michael Allman] Update source of scaladocs when changing Scala version
      475088e [Michael Allman] Replace jackson-module-scala_2.10 with jackson-module-scala_${scala.binary.version}
      f5b6dc5e
    • Shivaram Venkataraman's avatar
      [SPARK-9179] [BUILD] Use default primary author if unspecified · 228ab65a
      Shivaram Venkataraman authored
      Fixes feature introduced in #7508 to use the default value if nothing is specified in command line
      
      cc liancheng rxin pwendell
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #7558 from shivaram/merge-script-fix and squashes the following commits:
      
      7092141 [Shivaram Venkataraman] Use default primary author if unspecified
      228ab65a
  4. Jul 19, 2015
    • Cheng Lian's avatar
      [SPARK-9179] [BUILD] Allows committers to specify primary author of the PR to be merged · bc24289f
      Cheng Lian authored
      It's a common case that some contributor contributes an initial version of a feature/bugfix, and later on some other people (mostly committers) fork and add more improvements. When merging these PRs, we probably want to specify the original author as the primary author. Currently we can only do this by running
      
      ```
      $ git commit --amend --author="name <email>"
      ```
      
      manually right before the merge script pushes to Apache Git repo. It would be nice if the script accepts user specified primary author information.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7508 from liancheng/spark-9179 and squashes the following commits:
      
      218d88e [Cheng Lian] Allows committers to specify primary author of the PR to be merged
      bc24289f
  5. Jul 17, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-7879] [MLLIB] KMeans API for spark.ml Pipelines · 34a889db
      Yu ISHIKAWA authored
      I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
      
      [SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
      
      be752de [Yu ISHIKAWA] Add assertions
      a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
      4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
      fb2417c [Yu ISHIKAWA] Use getInt, instead of get
      f397be4 [Yu ISHIKAWA] Switch the comparisons.
      ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
      effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
      c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
      19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
      1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
      f8338bc [Yu ISHIKAWA] Add the placeholders in Python
      4a03003 [Yu ISHIKAWA] Test for contains in Python
      6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
      288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
      5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
      97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
      e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
      978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
      2ec80bc [Yu ISHIKAWA] Fit on 1 line
      e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
      b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
      f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
      3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
      4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
      2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
      19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
      4d2ad1e [Yu ISHIKAWA] Modify the indentations
      0ae422f [Yu ISHIKAWA] Add a test for `setParams`
      4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
      11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
      220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
      92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
      c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
      6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
      687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
      a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
      5bedc51 [Yu ISHIKAWA] Remve an extra new line
      444c289 [Yu ISHIKAWA] Add the validation for `runs`
      e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
      7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
      7991e15 [Yu ISHIKAWA] Add a validation for `k`
      c2df35d [Yu ISHIKAWA] Make `predict` private
      93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
      d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
      e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
      8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
      6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
      99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
      79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
      6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
      20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
      11c2a12 [Yu ISHIKAWA] Limit the imports
      badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
      f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
      85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
      aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
      c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
      598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
      63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala
      34a889db
  6. Jul 15, 2015
    • MechCoder's avatar
      [SPARK-8706] [PYSPARK] [PROJECT INFRA] Add pylint checks to PySpark · 20bb10f8
      MechCoder authored
      This adds Pylint checks to PySpark.
      
      For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
      We still need to figure out what rules to be allowed.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7241 from MechCoder/pylint and squashes the following commits:
      
      2fc7291 [MechCoder] Remove pylint test fail
      6d883a2 [MechCoder] Silence warnings and make pylint tests fail to check if it works in jenkins
      f3a5e17 [MechCoder] undefined-variable
      ca8b749 [MechCoder] Minor changes
      71629f8 [MechCoder] remove trailing whitespace
      8498ff9 [MechCoder] Remove blacklisted arguments and pointless statements check
      1dbd094 [MechCoder] Disable all checks for now
      8b8aa8a [MechCoder] Add pylint configuration file
      7871bb1 [MechCoder] [SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark
      20bb10f8
  7. Jul 14, 2015
  8. Jul 13, 2015
    • Davies Liu's avatar
      79c35826
    • MechCoder's avatar
      [SPARK-8706] [PYSPARK] [PROJECT INFRA] Add pylint checks to PySpark · 9b62e937
      MechCoder authored
      This adds Pylint checks to PySpark.
      
      For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
      We still need to figure out what rules to be allowed.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7241 from MechCoder/pylint and squashes the following commits:
      
      8496834 [MechCoder] Silence warnings and make pylint tests fail to check if it works in jenkins
      57393a3 [MechCoder] undefined-variable
      a8e2547 [MechCoder] Minor changes
      7753810 [MechCoder] remove trailing whitespace
      75c5d2b [MechCoder] Remove blacklisted arguments and pointless statements check
      6bde250 [MechCoder] Disable all checks for now
      3464666 [MechCoder] Add pylint configuration file
      d28109f [MechCoder] [SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark
      9b62e937
  9. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  10. Jul 07, 2015
    • Patrick Wendell's avatar
      [HOTFIX] Rename release-profile to release · 1cb2629f
      Patrick Wendell authored
      when publishing releases. We named it as 'release-profile' because that is
      the Maven convention. However, it turns out this special name causes several
      other things to kick-in when we are creating releases that are not desirable.
      For instance, it triggers the javadoc plugin to run, which actually fails
      in our current build set-up.
      
      The fix is just to rename this to a different profile to have no
      collateral damage associated with its use.
      1cb2629f
  11. Jul 06, 2015
    • Andrew Or's avatar
      [SPARK-8819] Fix build for maven 3.3.x · 9eae5fa6
      Andrew Or authored
      This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193.
      
      This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are:
      - If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before.
      - If you are releasing Spark without this profile, you will run into SPARK-8781.
      - If you are not releasing Spark but you are using this profile, you may run into SPARK-8819.
      - If you are not releasing Spark and you did not include this profile, you are fine.
      
      This is all documented in `pom.xml` and tested locally with both versions of maven.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7219 from andrewor14/fix-maven-build and squashes the following commits:
      
      1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build
      3574ae4 [Andrew Or] Review comments
      f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`
      9eae5fa6
  12. Jul 02, 2015
    • Josh Rosen's avatar
      [SPARK-8740] [PROJECT INFRA] Support GitHub OAuth tokens in dev/merge_spark_pr.py · 377ff4c9
      Josh Rosen authored
      This commit allows `dev/merge_spark_pr.py` to use personal GitHub OAuth tokens in order to make authenticated requests. This is necessary to work around per-IP rate limiting issues.
      
      To use a token, just set the `GITHUB_OAUTH_KEY` environment variable.  You can create a personal token at https://github.com/settings/tokens; we only require `public_repo` scope.
      
      If the script fails due to a rate-limit issue, it now logs a useful message directing the user to the OAuth token instructions.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7136 from JoshRosen/pr-merge-script-oauth-authentication and squashes the following commits:
      
      4d011bd [Josh Rosen] Fix error message
      23d92ff [Josh Rosen] Support GitHub OAuth tokens in dev/merge_spark_pr.py
      377ff4c9
  13. Jul 01, 2015
    • zsxwing's avatar
      [SPARK-8378] [STREAMING] Add the Python API for Flume · 75b9fe4c
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6830 from zsxwing/flume-python and squashes the following commits:
      
      78dfdac [zsxwing] Fix the compile error in the test code
      f1bf3c0 [zsxwing] Address TD's comments
      0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
      e93736b [zsxwing] Fix the test case for determine_modules_to_test
      9d5821e [zsxwing] Fix pyspark_core dependencies
      f9ee681 [zsxwing] Merge branch 'master' into flume-python
      7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
      b96b0de [zsxwing] Merge branch 'master' into flume-python
      ce85e83 [zsxwing] Fix incompatible issues for Python 3
      01cbb3d [zsxwing] Add import sys
      152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
      14ba0ff [zsxwing] Add flume-assembly for sbt building
      b8d5551 [zsxwing] Merge branch 'master' into flume-python
      4762c34 [zsxwing] Fix the doc
      0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
      9f33873 [zsxwing] Add the Python API for Flume
      75b9fe4c
  14. Jun 29, 2015
    • Josh Rosen's avatar
      [SPARK-5161] Parallelize Python test execution · 7bbbe380
      Josh Rosen authored
      This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).
      
      To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits:
      
      feb3763 [Josh Rosen] Re-enable other tests
      f87ea81 [Josh Rosen] Only log output from failed tests
      d4ded73 [Josh Rosen] Logging improvements
      a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
      1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
      110cd9d [Josh Rosen] Fix universal_newlines for Python 3
      cd13db8 [Josh Rosen] Also log python_implementation
      9e31127 [Josh Rosen] Log Python --version output for each executable.
      a2b9094 [Josh Rosen] Bump up parallelism.
      5552380 [Josh Rosen] Python 3 fix
      866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
      87cb988 [Josh Rosen] Skip MLLib tests for PyPy
      8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
      9129027 [Josh Rosen] Disable Spark UI in Python tests
      037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
      af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
      7bbbe380
    • Brennon York's avatar
      [SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a nice way · 5c796d57
      Brennon York authored
      Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like:
      
      ```
      -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  streaming-kafka-assembly/assembly
      ```
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #7085 from brennonyork/SPARK-8693 and squashes the following commits:
      
      c5575f1 [Brennon York] added commas to end of print statements for proper printing
      5c796d57
  15. Jun 28, 2015
  16. Jun 27, 2015
    • Josh Rosen's avatar
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with... · 40648c56
      Josh Rosen authored
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system
      
      This patch refactors the `python/run-tests` script:
      
      - It's now written in Python instead of Bash.
      - The descriptions of the tests to run are now stored in `dev/run-tests`'s modules.  This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes.  For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.
      - `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):
      
        ```
      Usage: run-tests [options]
      
      Options:
        -h, --help            show this help message and exit
        --python-executables=PYTHON_EXECUTABLES
                              A comma-separated list of Python executables to test
                              against (default: python2.6,python3.4,pypy)
        --modules=MODULES     A comma-separated list of Python modules to test
                              (default: pyspark-core,pyspark-ml,pyspark-mllib
                              ,pyspark-sql,pyspark-streaming)
         ```
      - `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6967 from JoshRosen/run-tests-python-modules and squashes the following commits:
      
      f578d6d [Josh Rosen] Fix print for Python 2.x
      8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks
      34c98d2 [Josh Rosen] Fix universal_newlines for Python 3
      8f65ed0 [Josh Rosen] Fix handling of  module in python/run-tests
      37aff00 [Josh Rosen] Python 3 fix
      27a389f [Josh Rosen] Skip MLLib tests for PyPy
      c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests
      568a3fd [Josh Rosen] Fix hashbang
      3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test)
      f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3
      9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON
      d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules
      4f8902c [Josh Rosen] Python lint fixes.
      8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3.
      f542ac5 [Josh Rosen] Fix lint check for Python 3
      fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks
      2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags
      b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests
      caeb040 [Josh Rosen] Fixes to PySpark test module definitions
      d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests
      def2d8a [Josh Rosen] Two minor fixes
      aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly
      04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script
      4c97136 [Josh Rosen] PYTHONPATH fixes
      dcc9c09 [Josh Rosen] Fix time division
      32660fc [Josh Rosen] Initial cut at Python test runner refactoring
      311c6a9 [Josh Rosen] Move shell utility functions to own module.
      1bdeb87 [Josh Rosen] Move module definitions to separate file.
      40648c56
  17. Jun 26, 2015
    • Josh Rosen's avatar
      [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() · 41afa165
      Josh Rosen authored
      This patch addresses a critical issue in the PySpark tests:
      
      Several of our Python modules' `__main__` methods call `doctest.testmod()` in order to run doctests but forget to check and handle its return value. As a result, some PySpark test failures can go unnoticed because they will not fail the build.
      
      Fortunately, there was only one test failure which was masked by this bug: a `pyspark.profiler` doctest was failing due to changes in RDD pipelining.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7032 from JoshRosen/testmod-fix and squashes the following commits:
      
      60dbdc0 [Josh Rosen] Account for int vs. long formatting change in Python 3
      8b8d80a [Josh Rosen] Fix failing test.
      e6423f9 [Josh Rosen] Check return code for all uses of doctest.testmod().
      41afa165
  18. Jun 24, 2015
    • fe2s's avatar
      [SPARK-8558] [BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set · dca21a83
      fe2s authored
      Author: fe2s <aka.fe2s@gmail.com>
      Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
      
      Closes #6956 from fe2s/fix-run-tests and squashes the following commits:
      
      31b6edc [fe2s] str is a built-in function, so using it as a variable name will lead to spurious warnings in some Python linters
      7d781a0 [fe2s] fixing for openjdk/IBM, seems like they have slightly different wording, but all have 'version' word. Surrounding with spaces for the case if version word appears in _JAVA_OPTIONS
      cd455ef [fe2s] address comment, looking for java version string rather than expecting to have on a certain line number
      ad577d7 [Oleksiy Dyagilev] [SPARK-8558][BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
      dca21a83
  19. Jun 22, 2015
  20. Jun 20, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-8495] [SPARKR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script · 004f5737
      Yu ISHIKAWA authored
      Thank Shivaram Venkataraman for your support. This is a prototype script to validate the R files.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6922 from yu-iskw/SPARK-6813 and squashes the following commits:
      
      c1ffe6b [Yu ISHIKAWA] Modify to save result to a log file and add a rule to validate
      5520806 [Yu ISHIKAWA] Exclude the .lintr file not to check Apache lincence
      8f94680 [Yu ISHIKAWA] [SPARK-8495][SparkR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script
      004f5737
    • Josh Rosen's avatar
      [SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests · 7a3c424e
      Josh Rosen authored
      This patch builds upon #5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.
      
      This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests.  It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.
      
      This patch also adds doctests for the new graph traversal / change mapping code.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits:
      
      75de450 [Josh Rosen] Use module system to determine which build profiles to enable.
      4224da5 [Josh Rosen] Add documentation to Module.
      a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects
      e46539f [Josh Rosen] Fix camel-cased endswith()
      35a3052 [Josh Rosen] Enable Hive tests when running all tests
      df10e23 [Josh Rosen] update to reflect fact that no module depends on root
      3670d50 [Josh Rosen] mllib should depend on streaming
      dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks
      7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified
      43a0ced [Josh Rosen] Minor fixes
      3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds)
      37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see #6865)
      f53864b [Josh Rosen] Finish integrating module changes
      f0249bd [Josh Rosen] WIP
      7a3c424e
  21. Jun 17, 2015
    • Josh Rosen's avatar
    • Brennon York's avatar
      [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python · 50a0496a
      Brennon York authored
      All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!
      
      /cc srowen pwendell nchammas
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5694 from brennonyork/SPARK-7017 and squashes the following commits:
      
      154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set
      3922a85 [Brennon York] removed necessary passed in variable
      f9fbe54 [Brennon York] reverted doc test change
      8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated
      05d435b [Brennon York] added check for jekyll install
      22edb78 [Brennon York] add check if jekyll isn't installed on the path
      2dff136 [Brennon York] fixed pep8 whitespace errors
      767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes
      c42cf9a [Brennon York] unpack set operations with splat (*)
      fb85a41 [Brennon York] fixed minor set bug
      0379833 [Brennon York] minor doc addition to print the changed modules
      aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity
      ec1ae78 [Brennon York] minor name changes, bug fixes
      b7c72b9 [Brennon York] reverting streaming context
      03fdd7b [Brennon York] fixed the tuple () wraps around example lambda
      705d12e [Brennon York] changed example to comply with pep3113 supporting python3
      60b3d51 [Brennon York] prepend rather than append onto PATH
      7d2f5e2 [Brennon York] updated python tests to remove unused variable
      2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests
      eb684b6 [Brennon York] fixed sbt_test_goals reference error
      db7ae6f [Brennon York] reverted SPARK_HOME from start of command
      1ecca26 [Brennon York] fixed merge conflicts
      2fcdfc0 [Brennon York] testing targte branch dump on jenkins
      1f607b1 [Brennon York] finalizing revisions to modular tests
      8afbe93 [Brennon York] made error codes a global
      0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints
      d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run
      b1248dc [Brennon York] exec python rather than running python and exiting with return code
      f9deba1 [Brennon York] python to python2 and removed newline
      6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249
      f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908
      703f095 [Brennon York] fixed merge conflicts
      b1ca593 [Brennon York] reverted the sparkR test
      afeb093 [Brennon York] updated to make sparkR test fail
      1dada6b [Brennon York] reverted pyspark test failure
      9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure
      d825aa4 [Brennon York] revert build break, add mima break
      f041d8a [Brennon York] added space from commented import to now test build breaking
      983f2a2 [Brennon York] comment out import to fail build test
      2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017
      76335fb [Brennon York] reverted rat license issue for sparkconf
      e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes
      56d3cb9 [Brennon York] changed test back and commented out import to break compile
      b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment
      7613558 [Brennon York] updated to return the proper env variable for return codes
      a5bd445 [Brennon York] reverted license, changed test in shuffle to fail
      803143a [Brennon York] removed license file for SparkContext
      b0b2604 [Brennon York] comment out import to see if build fails and returns properly
      83e80ef [Brennon York] attempt at better python output when called from bash
      c095fa6 [Brennon York] removed another wait() call
      26e18e8 [Brennon York] removed unnecessary wait()
      07210a9 [Brennon York] minor doc string change for java version with namedtuple update
      ec03bf3 [Brennon York] added namedtuple for java version to add readability
      2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call
      639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script
      3c53a1a [Brennon York] uncomment the scala tests :)
      6126c4f [Brennon York] refactored run-tests into python
      50a0496a
  22. Jun 11, 2015
  23. Jun 10, 2015
  24. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Change link to jenkins builds on github. · da20c8ca
      Marcelo Vanzin authored
      Link to the tail of the console log, instead of the full log. That's
      bound to have the info the user is looking for, and at the same time
      loads way more quickly than the (huge) full log, which is just one click
      away if needed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6664 from vanzin/jenkins-link and squashes the following commits:
      
      ba07ed8 [Marcelo Vanzin] [minor] [build] Change link to jenkins builds on github.
      da20c8ca
  25. Jun 04, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh · 3dc00528
      Shivaram Venkataraman authored
      This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      
      Related to discussion in #6567
      
      cc pwendell srowen -- Let me know if this looks better
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
      
      b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
      8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      3dc00528
  26. Jun 03, 2015
    • Andrew Or's avatar
      [BUILD] Use right branch when checking against Hive · 9cf740f3
      Andrew Or authored
      Right now we always run hive tests in branch-1.4 PRs because we compare whether the diff against master involves hive changes. Really we should be comparing against the target branch itself.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6629 from andrewor14/build-check-hive and squashes the following commits:
      
      450fbbd [Andrew Or] [BUILD] Use right branch when checking against Hive
      9cf740f3
    • Andrew Or's avatar
      [BUILD] Increase Jenkins test timeout · e35cd36e
      Andrew Or authored
      Currently hive tests alone take 40m. The right thing to do is
      to reduce the test time. However, that is a bigger project and
      we currently have PRs blocking on tests not timing out.
      e35cd36e
  27. Jun 01, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Add maven profile to build R package docs · cae9306c
      Shivaram Venkataraman authored
      Also use that profile in create-release.sh
      
      cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6567 from shivaram/SPARK-8027 and squashes the following commits:
      
      8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
      cae9306c
  28. May 28, 2015
    • Kay Ousterhout's avatar
      [SPARK-7933] Remove Patrick's username/pw from merge script · 66c49ed6
      Kay Ousterhout authored
      Looks like this was added by accident when pwendell merged a commit back in September: fe2b1d6a
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6485 from kayousterhout/SPARK-7933 and squashes the following commits:
      
      7c6164a [Kay Ousterhout] [SPARK-7933] Remove Patrick's username/pw from merge script
      66c49ed6
  29. May 25, 2015
  30. May 24, 2015
  31. May 23, 2015
    • Davies Liu's avatar
      Fix install jira-python · a4df0f2d
      Davies Liu authored
      jira-pytyhon package should be installed by
      
        sudo pip install jira
      
      cc pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6367 from davies/fix_jira_python2 and squashes the following commits:
      
      fbb3c8e [Davies Liu] Fix install jira-python
      a4df0f2d
    • Shivaram Venkataraman's avatar
      [MINOR] Add SparkR to create-release script · 017b3404
      Shivaram Venkataraman authored
      Enables the SparkR profiles for all the binary builds we create
      
      cc pwendell
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6371 from shivaram/sparkr-create-release and squashes the following commits:
      
      ca5a0b2 [Shivaram Venkataraman] Add -Psparkr to create-release.sh
      017b3404
  32. May 22, 2015
Loading