Skip to content
Snippets Groups Projects
  1. Jul 13, 2015
    • Davies Liu's avatar
      79c35826
    • MechCoder's avatar
      [SPARK-8706] [PYSPARK] [PROJECT INFRA] Add pylint checks to PySpark · 9b62e937
      MechCoder authored
      This adds Pylint checks to PySpark.
      
      For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
      We still need to figure out what rules to be allowed.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7241 from MechCoder/pylint and squashes the following commits:
      
      8496834 [MechCoder] Silence warnings and make pylint tests fail to check if it works in jenkins
      57393a3 [MechCoder] undefined-variable
      a8e2547 [MechCoder] Minor changes
      7753810 [MechCoder] remove trailing whitespace
      75c5d2b [MechCoder] Remove blacklisted arguments and pointless statements check
      6bde250 [MechCoder] Disable all checks for now
      3464666 [MechCoder] Add pylint configuration file
      d28109f [MechCoder] [SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark
      9b62e937
  2. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  3. Jul 07, 2015
    • Patrick Wendell's avatar
      [HOTFIX] Rename release-profile to release · 1cb2629f
      Patrick Wendell authored
      when publishing releases. We named it as 'release-profile' because that is
      the Maven convention. However, it turns out this special name causes several
      other things to kick-in when we are creating releases that are not desirable.
      For instance, it triggers the javadoc plugin to run, which actually fails
      in our current build set-up.
      
      The fix is just to rename this to a different profile to have no
      collateral damage associated with its use.
      1cb2629f
  4. Jul 06, 2015
    • Andrew Or's avatar
      [SPARK-8819] Fix build for maven 3.3.x · 9eae5fa6
      Andrew Or authored
      This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193.
      
      This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are:
      - If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before.
      - If you are releasing Spark without this profile, you will run into SPARK-8781.
      - If you are not releasing Spark but you are using this profile, you may run into SPARK-8819.
      - If you are not releasing Spark and you did not include this profile, you are fine.
      
      This is all documented in `pom.xml` and tested locally with both versions of maven.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7219 from andrewor14/fix-maven-build and squashes the following commits:
      
      1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build
      3574ae4 [Andrew Or] Review comments
      f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`
      9eae5fa6
  5. Jul 02, 2015
    • Josh Rosen's avatar
      [SPARK-8740] [PROJECT INFRA] Support GitHub OAuth tokens in dev/merge_spark_pr.py · 377ff4c9
      Josh Rosen authored
      This commit allows `dev/merge_spark_pr.py` to use personal GitHub OAuth tokens in order to make authenticated requests. This is necessary to work around per-IP rate limiting issues.
      
      To use a token, just set the `GITHUB_OAUTH_KEY` environment variable.  You can create a personal token at https://github.com/settings/tokens; we only require `public_repo` scope.
      
      If the script fails due to a rate-limit issue, it now logs a useful message directing the user to the OAuth token instructions.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7136 from JoshRosen/pr-merge-script-oauth-authentication and squashes the following commits:
      
      4d011bd [Josh Rosen] Fix error message
      23d92ff [Josh Rosen] Support GitHub OAuth tokens in dev/merge_spark_pr.py
      377ff4c9
  6. Jul 01, 2015
    • zsxwing's avatar
      [SPARK-8378] [STREAMING] Add the Python API for Flume · 75b9fe4c
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6830 from zsxwing/flume-python and squashes the following commits:
      
      78dfdac [zsxwing] Fix the compile error in the test code
      f1bf3c0 [zsxwing] Address TD's comments
      0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
      e93736b [zsxwing] Fix the test case for determine_modules_to_test
      9d5821e [zsxwing] Fix pyspark_core dependencies
      f9ee681 [zsxwing] Merge branch 'master' into flume-python
      7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
      b96b0de [zsxwing] Merge branch 'master' into flume-python
      ce85e83 [zsxwing] Fix incompatible issues for Python 3
      01cbb3d [zsxwing] Add import sys
      152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
      14ba0ff [zsxwing] Add flume-assembly for sbt building
      b8d5551 [zsxwing] Merge branch 'master' into flume-python
      4762c34 [zsxwing] Fix the doc
      0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
      9f33873 [zsxwing] Add the Python API for Flume
      75b9fe4c
  7. Jun 29, 2015
    • Josh Rosen's avatar
      [SPARK-5161] Parallelize Python test execution · 7bbbe380
      Josh Rosen authored
      This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).
      
      To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits:
      
      feb3763 [Josh Rosen] Re-enable other tests
      f87ea81 [Josh Rosen] Only log output from failed tests
      d4ded73 [Josh Rosen] Logging improvements
      a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
      1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
      110cd9d [Josh Rosen] Fix universal_newlines for Python 3
      cd13db8 [Josh Rosen] Also log python_implementation
      9e31127 [Josh Rosen] Log Python --version output for each executable.
      a2b9094 [Josh Rosen] Bump up parallelism.
      5552380 [Josh Rosen] Python 3 fix
      866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
      87cb988 [Josh Rosen] Skip MLLib tests for PyPy
      8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
      9129027 [Josh Rosen] Disable Spark UI in Python tests
      037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
      af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
      7bbbe380
    • Brennon York's avatar
      [SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a nice way · 5c796d57
      Brennon York authored
      Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like:
      
      ```
      -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  streaming-kafka-assembly/assembly
      ```
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #7085 from brennonyork/SPARK-8693 and squashes the following commits:
      
      c5575f1 [Brennon York] added commas to end of print statements for proper printing
      5c796d57
  8. Jun 28, 2015
  9. Jun 27, 2015
    • Josh Rosen's avatar
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with... · 40648c56
      Josh Rosen authored
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system
      
      This patch refactors the `python/run-tests` script:
      
      - It's now written in Python instead of Bash.
      - The descriptions of the tests to run are now stored in `dev/run-tests`'s modules.  This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes.  For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.
      - `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):
      
        ```
      Usage: run-tests [options]
      
      Options:
        -h, --help            show this help message and exit
        --python-executables=PYTHON_EXECUTABLES
                              A comma-separated list of Python executables to test
                              against (default: python2.6,python3.4,pypy)
        --modules=MODULES     A comma-separated list of Python modules to test
                              (default: pyspark-core,pyspark-ml,pyspark-mllib
                              ,pyspark-sql,pyspark-streaming)
         ```
      - `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6967 from JoshRosen/run-tests-python-modules and squashes the following commits:
      
      f578d6d [Josh Rosen] Fix print for Python 2.x
      8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks
      34c98d2 [Josh Rosen] Fix universal_newlines for Python 3
      8f65ed0 [Josh Rosen] Fix handling of  module in python/run-tests
      37aff00 [Josh Rosen] Python 3 fix
      27a389f [Josh Rosen] Skip MLLib tests for PyPy
      c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests
      568a3fd [Josh Rosen] Fix hashbang
      3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test)
      f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3
      9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON
      d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules
      4f8902c [Josh Rosen] Python lint fixes.
      8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3.
      f542ac5 [Josh Rosen] Fix lint check for Python 3
      fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks
      2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags
      b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests
      caeb040 [Josh Rosen] Fixes to PySpark test module definitions
      d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests
      def2d8a [Josh Rosen] Two minor fixes
      aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly
      04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script
      4c97136 [Josh Rosen] PYTHONPATH fixes
      dcc9c09 [Josh Rosen] Fix time division
      32660fc [Josh Rosen] Initial cut at Python test runner refactoring
      311c6a9 [Josh Rosen] Move shell utility functions to own module.
      1bdeb87 [Josh Rosen] Move module definitions to separate file.
      40648c56
  10. Jun 26, 2015
    • Josh Rosen's avatar
      [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() · 41afa165
      Josh Rosen authored
      This patch addresses a critical issue in the PySpark tests:
      
      Several of our Python modules' `__main__` methods call `doctest.testmod()` in order to run doctests but forget to check and handle its return value. As a result, some PySpark test failures can go unnoticed because they will not fail the build.
      
      Fortunately, there was only one test failure which was masked by this bug: a `pyspark.profiler` doctest was failing due to changes in RDD pipelining.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7032 from JoshRosen/testmod-fix and squashes the following commits:
      
      60dbdc0 [Josh Rosen] Account for int vs. long formatting change in Python 3
      8b8d80a [Josh Rosen] Fix failing test.
      e6423f9 [Josh Rosen] Check return code for all uses of doctest.testmod().
      41afa165
  11. Jun 24, 2015
    • fe2s's avatar
      [SPARK-8558] [BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set · dca21a83
      fe2s authored
      Author: fe2s <aka.fe2s@gmail.com>
      Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
      
      Closes #6956 from fe2s/fix-run-tests and squashes the following commits:
      
      31b6edc [fe2s] str is a built-in function, so using it as a variable name will lead to spurious warnings in some Python linters
      7d781a0 [fe2s] fixing for openjdk/IBM, seems like they have slightly different wording, but all have 'version' word. Surrounding with spaces for the case if version word appears in _JAVA_OPTIONS
      cd455ef [fe2s] address comment, looking for java version string rather than expecting to have on a certain line number
      ad577d7 [Oleksiy Dyagilev] [SPARK-8558][BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
      dca21a83
  12. Jun 22, 2015
  13. Jun 20, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-8495] [SPARKR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script · 004f5737
      Yu ISHIKAWA authored
      Thank Shivaram Venkataraman for your support. This is a prototype script to validate the R files.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6922 from yu-iskw/SPARK-6813 and squashes the following commits:
      
      c1ffe6b [Yu ISHIKAWA] Modify to save result to a log file and add a rule to validate
      5520806 [Yu ISHIKAWA] Exclude the .lintr file not to check Apache lincence
      8f94680 [Yu ISHIKAWA] [SPARK-8495][SparkR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script
      004f5737
    • Josh Rosen's avatar
      [SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests · 7a3c424e
      Josh Rosen authored
      This patch builds upon #5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.
      
      This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests.  It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.
      
      This patch also adds doctests for the new graph traversal / change mapping code.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits:
      
      75de450 [Josh Rosen] Use module system to determine which build profiles to enable.
      4224da5 [Josh Rosen] Add documentation to Module.
      a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects
      e46539f [Josh Rosen] Fix camel-cased endswith()
      35a3052 [Josh Rosen] Enable Hive tests when running all tests
      df10e23 [Josh Rosen] update to reflect fact that no module depends on root
      3670d50 [Josh Rosen] mllib should depend on streaming
      dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks
      7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified
      43a0ced [Josh Rosen] Minor fixes
      3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds)
      37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see #6865)
      f53864b [Josh Rosen] Finish integrating module changes
      f0249bd [Josh Rosen] WIP
      7a3c424e
  14. Jun 17, 2015
    • Josh Rosen's avatar
    • Brennon York's avatar
      [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python · 50a0496a
      Brennon York authored
      All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!
      
      /cc srowen pwendell nchammas
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5694 from brennonyork/SPARK-7017 and squashes the following commits:
      
      154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set
      3922a85 [Brennon York] removed necessary passed in variable
      f9fbe54 [Brennon York] reverted doc test change
      8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated
      05d435b [Brennon York] added check for jekyll install
      22edb78 [Brennon York] add check if jekyll isn't installed on the path
      2dff136 [Brennon York] fixed pep8 whitespace errors
      767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes
      c42cf9a [Brennon York] unpack set operations with splat (*)
      fb85a41 [Brennon York] fixed minor set bug
      0379833 [Brennon York] minor doc addition to print the changed modules
      aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity
      ec1ae78 [Brennon York] minor name changes, bug fixes
      b7c72b9 [Brennon York] reverting streaming context
      03fdd7b [Brennon York] fixed the tuple () wraps around example lambda
      705d12e [Brennon York] changed example to comply with pep3113 supporting python3
      60b3d51 [Brennon York] prepend rather than append onto PATH
      7d2f5e2 [Brennon York] updated python tests to remove unused variable
      2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests
      eb684b6 [Brennon York] fixed sbt_test_goals reference error
      db7ae6f [Brennon York] reverted SPARK_HOME from start of command
      1ecca26 [Brennon York] fixed merge conflicts
      2fcdfc0 [Brennon York] testing targte branch dump on jenkins
      1f607b1 [Brennon York] finalizing revisions to modular tests
      8afbe93 [Brennon York] made error codes a global
      0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints
      d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run
      b1248dc [Brennon York] exec python rather than running python and exiting with return code
      f9deba1 [Brennon York] python to python2 and removed newline
      6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249
      f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908
      703f095 [Brennon York] fixed merge conflicts
      b1ca593 [Brennon York] reverted the sparkR test
      afeb093 [Brennon York] updated to make sparkR test fail
      1dada6b [Brennon York] reverted pyspark test failure
      9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure
      d825aa4 [Brennon York] revert build break, add mima break
      f041d8a [Brennon York] added space from commented import to now test build breaking
      983f2a2 [Brennon York] comment out import to fail build test
      2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017
      76335fb [Brennon York] reverted rat license issue for sparkconf
      e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes
      56d3cb9 [Brennon York] changed test back and commented out import to break compile
      b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment
      7613558 [Brennon York] updated to return the proper env variable for return codes
      a5bd445 [Brennon York] reverted license, changed test in shuffle to fail
      803143a [Brennon York] removed license file for SparkContext
      b0b2604 [Brennon York] comment out import to see if build fails and returns properly
      83e80ef [Brennon York] attempt at better python output when called from bash
      c095fa6 [Brennon York] removed another wait() call
      26e18e8 [Brennon York] removed unnecessary wait()
      07210a9 [Brennon York] minor doc string change for java version with namedtuple update
      ec03bf3 [Brennon York] added namedtuple for java version to add readability
      2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call
      639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script
      3c53a1a [Brennon York] uncomment the scala tests :)
      6126c4f [Brennon York] refactored run-tests into python
      50a0496a
  15. Jun 11, 2015
  16. Jun 10, 2015
  17. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Change link to jenkins builds on github. · da20c8ca
      Marcelo Vanzin authored
      Link to the tail of the console log, instead of the full log. That's
      bound to have the info the user is looking for, and at the same time
      loads way more quickly than the (huge) full log, which is just one click
      away if needed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6664 from vanzin/jenkins-link and squashes the following commits:
      
      ba07ed8 [Marcelo Vanzin] [minor] [build] Change link to jenkins builds on github.
      da20c8ca
  18. Jun 04, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh · 3dc00528
      Shivaram Venkataraman authored
      This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      
      Related to discussion in #6567
      
      cc pwendell srowen -- Let me know if this looks better
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
      
      b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
      8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      3dc00528
  19. Jun 03, 2015
    • Andrew Or's avatar
      [BUILD] Use right branch when checking against Hive · 9cf740f3
      Andrew Or authored
      Right now we always run hive tests in branch-1.4 PRs because we compare whether the diff against master involves hive changes. Really we should be comparing against the target branch itself.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6629 from andrewor14/build-check-hive and squashes the following commits:
      
      450fbbd [Andrew Or] [BUILD] Use right branch when checking against Hive
      9cf740f3
    • Andrew Or's avatar
      [BUILD] Increase Jenkins test timeout · e35cd36e
      Andrew Or authored
      Currently hive tests alone take 40m. The right thing to do is
      to reduce the test time. However, that is a bigger project and
      we currently have PRs blocking on tests not timing out.
      e35cd36e
  20. Jun 01, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Add maven profile to build R package docs · cae9306c
      Shivaram Venkataraman authored
      Also use that profile in create-release.sh
      
      cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6567 from shivaram/SPARK-8027 and squashes the following commits:
      
      8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
      cae9306c
  21. May 28, 2015
    • Kay Ousterhout's avatar
      [SPARK-7933] Remove Patrick's username/pw from merge script · 66c49ed6
      Kay Ousterhout authored
      Looks like this was added by accident when pwendell merged a commit back in September: fe2b1d6a
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6485 from kayousterhout/SPARK-7933 and squashes the following commits:
      
      7c6164a [Kay Ousterhout] [SPARK-7933] Remove Patrick's username/pw from merge script
      66c49ed6
  22. May 25, 2015
  23. May 24, 2015
  24. May 23, 2015
    • Davies Liu's avatar
      Fix install jira-python · a4df0f2d
      Davies Liu authored
      jira-pytyhon package should be installed by
      
        sudo pip install jira
      
      cc pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6367 from davies/fix_jira_python2 and squashes the following commits:
      
      fbb3c8e [Davies Liu] Fix install jira-python
      a4df0f2d
    • Shivaram Venkataraman's avatar
      [MINOR] Add SparkR to create-release script · 017b3404
      Shivaram Venkataraman authored
      Enables the SparkR profiles for all the binary builds we create
      
      cc pwendell
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6371 from shivaram/sparkr-create-release and squashes the following commits:
      
      ca5a0b2 [Shivaram Venkataraman] Add -Psparkr to create-release.sh
      017b3404
  25. May 22, 2015
  26. May 21, 2015
    • Yin Huai's avatar
      [BUILD] Always run SQL tests in master build. · 147b6be3
      Yin Huai authored
      Seems our master build does not run HiveCompatibilitySuite (because _RUN_SQL_TESTS is not set). This PR introduces a property `AMP_JENKINS_PRB` to differentiate a PR build and a regular build. If a build is a regular one, we always set _RUN_SQL_TESTS to true.
      
      cc JoshRosen nchammas
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5955 from yhuai/runSQLTests and squashes the following commits:
      
      3d399bc [Yin Huai] Always run SQL tests in master build.
      147b6be3
    • Tathagata Das's avatar
      [SPARK-7722] [STREAMING] Added Kinesis to style checker · 311fab6f
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6325 from tdas/SPARK-7722 and squashes the following commits:
      
      9ab35b2 [Tathagata Das] Fixed styles in Kinesis
      311fab6f
  27. May 14, 2015
    • tedyu's avatar
      Make SPARK prefix a variable · 11a1a135
      tedyu authored
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #6153 from ted-yu/master and squashes the following commits:
      
      4e0bac5 [tedyu] Use JIRA_PROJECT_NAME as variable name
      ab982aa [tedyu] Make SPARK prefix a variable
      11a1a135
    • FavioVazquez's avatar
      [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions · 7fb715de
      FavioVazquez authored
      Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.
      
      Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.
      
      Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      
      Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:
      
      11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
      379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
      3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
      31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
      cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
      83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
      93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
      668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
      fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
      0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
      a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
      199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
      88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
      70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
      287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
      1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
      6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
      7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
      7fb715de
  28. May 13, 2015
  29. May 12, 2015
    • Patrick Wendell's avatar
      [SPARK-7592] Always set resolution to "Fixed" in PR merge script. · 1b9e434b
      Patrick Wendell authored
      The issue is that the behavior of the ASF JIRA silently
      changed. Now when the "Resolve Issue" transition occurs,
      the default resolution is "Pending Closed". We used to
      count on the default behavior being to set the
      resolution as "Fixed".
      
      The solution is to explicitly set the resolution as "Fixed" and not
      count on default behavior.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6103 from pwendell/merge-script-fix and squashes the following commits:
      
      dcc16a6 [Patrick Wendell] Always set resolution to "Fixed" in PR merge script.
      1b9e434b
  30. May 07, 2015
    • Michael Armbrust's avatar
      [SPARK-6908] [SQL] Use isolated Hive client · cd1d4110
      Michael Armbrust authored
      This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client.  By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile.  This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future.
      
      Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options:
       - a colon-separated list of jar files or directories for hive and hadoop.
       - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This
                  option is only valid when using the execution version of Hive.
       - `maven` - download the correct version of hive on demand from maven.
      
      By default, `builtin` is used for Hive 13.
      
      This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores.  However, the full removal of the Shim is deferred until a later PR.
      
      Remaining TODOs:
       - Remove the Hive Shims and inline code for Hive 13.
       - Several HiveCompatibility tests are not yet passing.
        - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer.  However, we currently only handle the common cases and not things like CTAS where the null format is specified.
        - `combine1` now leaks state about compression somehow, breaking all subsequent tests.  As such we currently add it to the blacklist
        - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore.  We are correctly propagating the information
        - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests.  It seems our `RESET` mechanism may not be as robust as it used to be?
      
      Other required changes:
       -  `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline.  Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`.  The full parsing here is not yet complete as detailed above in the remaining TODOs.  Since the operator is Hive specific, it is moved to the hive package.
       - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #5876 from marmbrus/useIsolatedClient and squashes the following commits:
      
      258d000 [Michael Armbrust] really really correct path handling
      e56fd4a [Michael Armbrust] getAbsolutePath
      5a259f5 [Michael Armbrust] fix typos
      81bb366 [Michael Armbrust] comments from vanzin
      5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
      4b5cd41 [Michael Armbrust] yin's comments
      f5de7de [Michael Armbrust] cleanup
      11e9c72 [Michael Armbrust] better coverage in versions suite
      7e8f010 [Michael Armbrust] better error messages and jar handling
      e7b3941 [Michael Armbrust] more permisive checking for function registration
      da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
      5fe5894 [Michael Armbrust] fix serialization suite
      81711c4 [Michael Armbrust] Initial support for running without maven
      1d8ae44 [Michael Armbrust] fix final tests?
      1c50813 [Michael Armbrust] more comments
      a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
      a6f5df1 [Michael Armbrust] style
      ab07f7e [Michael Armbrust] WIP
      4d8bf02 [Michael Armbrust] Remove hive 12 compilation
      8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client
      cd1d4110
  31. May 04, 2015
    • Andrew Or's avatar
      [MINOR] Fix python test typo? · 5a1a1075
      Andrew Or authored
      I suspect haven't been using anaconda in tests in a while. I wonder if this change actually does anything but this line as it stands looks strictly less correct.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5883 from andrewor14/fix-run-tests-typo and squashes the following commits:
      
      a3ad720 [Andrew Or] Fix typo?
      5a1a1075
Loading