Skip to content
Snippets Groups Projects
  1. Jun 29, 2015
    • Josh Rosen's avatar
      [SPARK-5161] Parallelize Python test execution · 7bbbe380
      Josh Rosen authored
      This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).
      
      To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits:
      
      feb3763 [Josh Rosen] Re-enable other tests
      f87ea81 [Josh Rosen] Only log output from failed tests
      d4ded73 [Josh Rosen] Logging improvements
      a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
      1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
      110cd9d [Josh Rosen] Fix universal_newlines for Python 3
      cd13db8 [Josh Rosen] Also log python_implementation
      9e31127 [Josh Rosen] Log Python --version output for each executable.
      a2b9094 [Josh Rosen] Bump up parallelism.
      5552380 [Josh Rosen] Python 3 fix
      866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
      87cb988 [Josh Rosen] Skip MLLib tests for PyPy
      8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
      9129027 [Josh Rosen] Disable Spark UI in Python tests
      037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
      af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
      7bbbe380
  2. Jun 17, 2015
    • Brennon York's avatar
      [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python · 50a0496a
      Brennon York authored
      All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!
      
      /cc srowen pwendell nchammas
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5694 from brennonyork/SPARK-7017 and squashes the following commits:
      
      154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set
      3922a85 [Brennon York] removed necessary passed in variable
      f9fbe54 [Brennon York] reverted doc test change
      8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated
      05d435b [Brennon York] added check for jekyll install
      22edb78 [Brennon York] add check if jekyll isn't installed on the path
      2dff136 [Brennon York] fixed pep8 whitespace errors
      767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes
      c42cf9a [Brennon York] unpack set operations with splat (*)
      fb85a41 [Brennon York] fixed minor set bug
      0379833 [Brennon York] minor doc addition to print the changed modules
      aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity
      ec1ae78 [Brennon York] minor name changes, bug fixes
      b7c72b9 [Brennon York] reverting streaming context
      03fdd7b [Brennon York] fixed the tuple () wraps around example lambda
      705d12e [Brennon York] changed example to comply with pep3113 supporting python3
      60b3d51 [Brennon York] prepend rather than append onto PATH
      7d2f5e2 [Brennon York] updated python tests to remove unused variable
      2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests
      eb684b6 [Brennon York] fixed sbt_test_goals reference error
      db7ae6f [Brennon York] reverted SPARK_HOME from start of command
      1ecca26 [Brennon York] fixed merge conflicts
      2fcdfc0 [Brennon York] testing targte branch dump on jenkins
      1f607b1 [Brennon York] finalizing revisions to modular tests
      8afbe93 [Brennon York] made error codes a global
      0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints
      d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run
      b1248dc [Brennon York] exec python rather than running python and exiting with return code
      f9deba1 [Brennon York] python to python2 and removed newline
      6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249
      f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908
      703f095 [Brennon York] fixed merge conflicts
      b1ca593 [Brennon York] reverted the sparkR test
      afeb093 [Brennon York] updated to make sparkR test fail
      1dada6b [Brennon York] reverted pyspark test failure
      9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure
      d825aa4 [Brennon York] revert build break, add mima break
      f041d8a [Brennon York] added space from commented import to now test build breaking
      983f2a2 [Brennon York] comment out import to fail build test
      2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017
      76335fb [Brennon York] reverted rat license issue for sparkconf
      e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes
      56d3cb9 [Brennon York] changed test back and commented out import to break compile
      b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment
      7613558 [Brennon York] updated to return the proper env variable for return codes
      a5bd445 [Brennon York] reverted license, changed test in shuffle to fail
      803143a [Brennon York] removed license file for SparkContext
      b0b2604 [Brennon York] comment out import to see if build fails and returns properly
      83e80ef [Brennon York] attempt at better python output when called from bash
      c095fa6 [Brennon York] removed another wait() call
      26e18e8 [Brennon York] removed unnecessary wait()
      07210a9 [Brennon York] minor doc string change for java version with namedtuple update
      ec03bf3 [Brennon York] added namedtuple for java version to add readability
      2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call
      639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script
      3c53a1a [Brennon York] uncomment the scala tests :)
      6126c4f [Brennon York] refactored run-tests into python
      50a0496a
  3. Jun 03, 2015
    • Andrew Or's avatar
      [BUILD] Use right branch when checking against Hive · 9cf740f3
      Andrew Or authored
      Right now we always run hive tests in branch-1.4 PRs because we compare whether the diff against master involves hive changes. Really we should be comparing against the target branch itself.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6629 from andrewor14/build-check-hive and squashes the following commits:
      
      450fbbd [Andrew Or] [BUILD] Use right branch when checking against Hive
      9cf740f3
  4. May 25, 2015
  5. May 24, 2015
  6. May 22, 2015
  7. May 21, 2015
    • Yin Huai's avatar
      [BUILD] Always run SQL tests in master build. · 147b6be3
      Yin Huai authored
      Seems our master build does not run HiveCompatibilitySuite (because _RUN_SQL_TESTS is not set). This PR introduces a property `AMP_JENKINS_PRB` to differentiate a PR build and a regular build. If a build is a regular one, we always set _RUN_SQL_TESTS to true.
      
      cc JoshRosen nchammas
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5955 from yhuai/runSQLTests and squashes the following commits:
      
      3d399bc [Yin Huai] Always run SQL tests in master build.
      147b6be3
  8. May 14, 2015
    • FavioVazquez's avatar
      [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions · 7fb715de
      FavioVazquez authored
      Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.
      
      Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.
      
      Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      
      Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:
      
      11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
      379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
      3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
      31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
      cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
      83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
      93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
      668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
      fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
      0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
      a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
      199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
      88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
      70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
      287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
      1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
      6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
      7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
      7fb715de
  9. May 07, 2015
    • Michael Armbrust's avatar
      [SPARK-6908] [SQL] Use isolated Hive client · cd1d4110
      Michael Armbrust authored
      This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client.  By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile.  This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future.
      
      Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options:
       - a colon-separated list of jar files or directories for hive and hadoop.
       - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This
                  option is only valid when using the execution version of Hive.
       - `maven` - download the correct version of hive on demand from maven.
      
      By default, `builtin` is used for Hive 13.
      
      This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores.  However, the full removal of the Shim is deferred until a later PR.
      
      Remaining TODOs:
       - Remove the Hive Shims and inline code for Hive 13.
       - Several HiveCompatibility tests are not yet passing.
        - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer.  However, we currently only handle the common cases and not things like CTAS where the null format is specified.
        - `combine1` now leaks state about compression somehow, breaking all subsequent tests.  As such we currently add it to the blacklist
        - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore.  We are correctly propagating the information
        - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests.  It seems our `RESET` mechanism may not be as robust as it used to be?
      
      Other required changes:
       -  `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline.  Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`.  The full parsing here is not yet complete as detailed above in the remaining TODOs.  Since the operator is Hive specific, it is moved to the hive package.
       - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #5876 from marmbrus/useIsolatedClient and squashes the following commits:
      
      258d000 [Michael Armbrust] really really correct path handling
      e56fd4a [Michael Armbrust] getAbsolutePath
      5a259f5 [Michael Armbrust] fix typos
      81bb366 [Michael Armbrust] comments from vanzin
      5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
      4b5cd41 [Michael Armbrust] yin's comments
      f5de7de [Michael Armbrust] cleanup
      11e9c72 [Michael Armbrust] better coverage in versions suite
      7e8f010 [Michael Armbrust] better error messages and jar handling
      e7b3941 [Michael Armbrust] more permisive checking for function registration
      da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
      5fe5894 [Michael Armbrust] fix serialization suite
      81711c4 [Michael Armbrust] Initial support for running without maven
      1d8ae44 [Michael Armbrust] fix final tests?
      1c50813 [Michael Armbrust] more comments
      a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
      a6f5df1 [Michael Armbrust] style
      ab07f7e [Michael Armbrust] WIP
      4d8bf02 [Michael Armbrust] Remove hive 12 compilation
      8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client
      cd1d4110
  10. May 04, 2015
    • Andrew Or's avatar
      [MINOR] Fix python test typo? · 5a1a1075
      Andrew Or authored
      I suspect haven't been using anaconda in tests in a while. I wonder if this change actually does anything but this line as it stands looks strictly less correct.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5883 from andrewor14/fix-run-tests-typo and squashes the following commits:
      
      a3ad720 [Andrew Or] Fix typo?
      5a1a1075
  11. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  12. Apr 10, 2015
    • jerryshao's avatar
      [SPARK-6211][Streaming] Add Python Kafka API unit test · 3290d2d1
      jerryshao authored
      Refactor the Kafka unit test and add Python API support. CC tdas davies please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #4961 from jerryshao/SPARK-6211 and squashes the following commits:
      
      ee4b919 [jerryshao] Fixed newly merged issue
      82c756e [jerryshao] Address the comments
      92912d1 [jerryshao] Address the commits
      0708bb1 [jerryshao] Fix rebase issue
      40b47a3 [Saisai Shao] Style fix
      f889657 [Saisai Shao] Update the code according
      8a2f3e2 [jerryshao] Address the issues
      0f1b7ce [jerryshao] Still fix the bug
      61a04f0 [jerryshao] Fix bugs and address the issues
      64d9877 [jerryshao] Fix rebase bugs
      8ad442f [jerryshao] Add kafka-assembly in run-tests
      6020b00 [jerryshao] Add more debug info in Shell
      8102d6e [jerryshao] Fix bug in Jenkins test
      fde1213 [jerryshao] Code style changes
      5536f95 [jerryshao] Refactor the Kafka unit test and add Python Kafka unittest support
      3290d2d1
  13. Apr 09, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-5654] Integrate SparkR · 2fe0a1aa
      Shivaram Venkataraman authored
      This pull requests integrates SparkR, an R frontend for Spark. The SparkR package contains both RDD and DataFrame APIs in R and is integrated with Spark's submission scripts to work on different cluster managers.
      
      Some integration points that would be great to get feedback on:
      
      1. Build procedure: SparkR requires R to be installed on the machine to be built. Right now we have a new Maven profile `-PsparkR` that can be used to enable SparkR builds
      
      2. YARN cluster mode: The R package that is built needs to be present on the driver and all the worker nodes during execution. The R package location is currently set using SPARK_HOME, but this might not work on YARN cluster mode.
      
      The SparkR package represents the work of many contributors and attached below is a list of people along with areas they worked on
      
      edwardt (edwart) - Documentation improvements
      Felix Cheung (felixcheung) - Documentation improvements
      Hossein Falaki (falaki)  - Documentation improvements
      Chris Freeman (cafreeman) - DataFrame API, Programming Guide
      Todd Gao (7c00) - R worker Internals
      Ryan Hafen (hafen) - SparkR Internals
      Qian Huang (hqzizania) - RDD API
      Hao Lin (hlin09) - RDD API, Closure cleaner
      Evert Lammerts (evertlammerts) - DataFrame API
      Davies Liu (davies) - DataFrame API, R worker internals, Merging with Spark
      Yi Lu (lythesia) - RDD API, Worker internals
      Matt Massie (massie) - Jenkins build
      Harihar Nahak (hnahak87) - SparkR examples
      Oscar Olmedo (oscaroboto) - Spark configuration
      Antonio Piccolboni (piccolbo) - SparkR examples, Namespace bug fixes
      Dan Putler (dputler) - Dataframe API, SparkR Install Guide
      Ashutosh Raina (ashutoshraina) - Build improvements
      Josh Rosen (joshrosen) - Travis CI build
      Sun Rui (sun-rui)- RDD API, JVM Backend, Shuffle improvements
      Shivaram Venkataraman (shivaram) - RDD API, JVM Backend, Worker Internals
      Zongheng Yang (concretevitamin) - RDD API, Pipelined RDDs, Examples and EC2 guide
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
      Author: Zongheng Yang <zongheng.y@gmail.com>
      Author: cafreeman <cfreeman@alteryx.com>
      Author: Shivaram Venkataraman <shivaram@eecs.berkeley.edu>
      Author: Davies Liu <davies@databricks.com>
      Author: Davies Liu <davies.liu@gmail.com>
      Author: hlin09 <hlin09pu@gmail.com>
      Author: Sun Rui <rui.sun@intel.com>
      Author: lythesia <iranaikimi@gmail.com>
      Author: oscaroboto <oscarjr@gmail.com>
      Author: Antonio Piccolboni <antonio@piccolboni.info>
      Author: root <edward>
      Author: edwardt <edwardt.tril@gmail.com>
      Author: hqzizania <qian.huang@intel.com>
      Author: dputler <dan.putler@gmail.com>
      Author: Todd Gao <todd.gao.2013@gmail.com>
      Author: Chris Freeman <cfreeman@alteryx.com>
      Author: Felix Cheung <fcheung@AVVOMAC-119.local>
      Author: Hossein <hossein@databricks.com>
      Author: Evert Lammerts <evert@apache.org>
      Author: Felix Cheung <fcheung@avvomac-119.t-mobile.com>
      Author: felixcheung <felixcheung_m@hotmail.com>
      Author: Ryan Hafen <rhafen@gmail.com>
      Author: Ashutosh Raina <ashutoshraina@users.noreply.github.com>
      Author: Oscar Olmedo <oscarjr@gmail.com>
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Yi Lu <iranaikimi@gmail.com>
      Author: Harihar Nahak <hnahak87@users.noreply.github.com>
      
      Closes #5096 from shivaram/R and squashes the following commits:
      
      da64742 [Davies Liu] fix Date serialization
      59266d1 [Davies Liu] check exclusive of primary-py-file and primary-r-file
      55808e4 [Davies Liu] fix tests
      5581c75 [Davies Liu] update author of SparkR
      f731b48 [Shivaram Venkataraman] Only run SparkR tests if R is installed
      64eda24 [Shivaram Venkataraman] Merge branch 'R' of https://github.com/amplab-extras/spark into R
      d7c3f22 [Shivaram Venkataraman] Address code review comments Changes include 1. Adding SparkR docs to API docs generated 2. Style fixes in SparkR scala files 3. Clean up of shell scripts and explanation of install-dev.sh
      377151f [Shivaram Venkataraman] Merge remote-tracking branch 'apache/master' into R
      eb5da53 [Shivaram Venkataraman] Merge pull request #3 from davies/R2
      a18ff5c [Davies Liu] Update sparkR.R
      5133f3a [Shivaram Venkataraman] Merge pull request #7 from hqzizania/R3
      940b631 [hqzizania] [SPARKR-92] Phase 2: implement sum(rdd)
      0e788c0 [Shivaram Venkataraman] Merge pull request #5 from hlin09/doc-fix
      3487461 [hlin09] Add tests log in .gitignore.
      1d1802e [Shivaram Venkataraman] Merge pull request #4 from felixcheung/r-require
      11981b7 [felixcheung] Update R to fail early if SparkR package is missing
      c300e08 [Davies Liu] remove duplicated file
      b045701 [Davies Liu] Merge branch 'remote_r' into R
      19c9368 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into remote_r
      f8fa8af [Davies Liu] mute logging when start/stop context
      e7104b6 [Davies Liu] remove ::: in SparkR
      a1777eb [Davies Liu] move rules into R/.gitignore
      e88b649 [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      6e20e71 [Davies Liu] address comments
      b433817 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R
      a1cedad [Shivaram Venkataraman] Merge pull request #228 from felixcheung/doc
      e089151 [Davies Liu] Merge pull request #225 from sun-rui/SPARKR-154_2
      463e28c [Davies Liu] Merge pull request #2 from shivaram/doc-fixes
      bc2d6d8 [Shivaram Venkataraman] Remove arg from sparkR.stop and update docs
      d425363 [Shivaram Venkataraman] Some doc fixes for column, generics, group
      1f1a7e0 [Shivaram Venkataraman] Some fixes to DataFrame, RDD, SQLContext docs
      104ad4e [Shivaram Venkataraman] Check the right env in exists
      cf5cd99 [Shivaram Venkataraman] Remove unused numCols argument
      85a50ec [Shivaram Venkataraman] Merge pull request #226 from RevolutionAnalytics/master
      3eacfc0 [Davies Liu] fix flaky test
      733380d [Davies Liu] update R examples (remove master from args)
      b21a0da [Davies Liu] Merge pull request #1 from shivaram/log4j-tests
      a1493d7 [Shivaram Venkataraman] Address comments
      e1f83ab [Shivaram Venkataraman] Send Spark INFO logs to a file in SparkR tests
      58276f5 [Shivaram Venkataraman] Merge branch 'R' of https://github.com/amplab-extras/spark into R
      52cc92d [Shivaram Venkataraman] Add license to create-docs.sh
      6ff5ea2 [Shivaram Venkataraman] Add instructions to generate docs
      1f478c5 [Shivaram Venkataraman] Merge branch 'R' of https://github.com/amplab-extras/spark into R
      02b4833 [Shivaram Venkataraman] Add a script to generate R docs (Rd, html) Also fix some issues with our documentation
      d6d3729 [Davies Liu] enable spark and pyspark tests
      0e5a83f [Davies Liu] fix code style
      afd8a77 [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      d87a181 [Davies Liu] fix flaky tests
      7100fb9 [Shivaram Venkataraman] Fix libPaths in README
      bdf3a14 [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      05e7375 [Davies Liu] sort generics
      b44e371 [Shivaram Venkataraman] Include RStudio instructions in README
      855537f [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      9fb6af3 [Davies Liu] mark R classes/objects are private
      423ea3c [Shivaram Venkataraman] Ignore unknown jobj in cleanup
      974e4ea [Davies Liu] fix flaky test
      410ec18 [Davies Liu] fix zipRDD() tests
      d8b24fc [Davies Liu] disable spark and python tests temporary
      ce3ca62 [Davies Liu] fix license check
      7da0049 [Davies Liu] fix build
      2892e29 [Davies Liu] support R in YARN cluster
      ebd4d07 [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      38cbf59 [Davies Liu] fix test of zipRDD()
      756ece0 [Shivaram Venkataraman] Update README remove outdated TODO
      d436f26 [Davies Liu] add missing files
      40d193a [Shivaram Venkataraman] Merge pull request #224 from sun-rui/SPARKR-224-new
      1a16cd6 [Davies Liu] rm PROJECT_HOME
      56670ef [Davies Liu] rm man page
      ba4b80b [Davies Liu] Merge branch 'remote_r' into R
      f04080c [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into remote_r
      028cbfb [Davies Liu] fix exit code of sparkr unit test
      42d8b4c [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      ef26015 [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      a1870e8 [Shivaram Venkataraman] Merge pull request #214 from sun-rui/SPARKR-156_3
      cb6e5e3 [Shivaram Venkataraman] Add scripts to start SparkR on windows
      8030847 [Shivaram Venkataraman] Set windows file separators, install dirs
      05afef0 [Shivaram Venkataraman] Only stop backend JVM if R launched it
      95d2de3 [Davies Liu] fix spark-submit with R scripot
      baefd9e [Shivaram Venkataraman] Make bin/sparkR use spark-submit As a part of this move the R initialization functions into first.R and first-submit.R
      d6f2bdd [Shivaram Venkataraman] Fix run-tests path
      ea90fab [Davies Liu] fix spark-submit with R path and sparkR -h
      0e2412c [Davies Liu] fix bin/sparkR
      9f6aa1f [Davies Liu] Merge branch 'R' of github.com:amplab-extras/spark into R
      479e3fe [Davies Liu] change println() to logging
      52ca6e5 [Shivaram Venkataraman] Add missing comma
      716b16f [Shivaram Venkataraman] Merge branch 'R' of https://github.com/amplab-extras/spark into R
      2d235d4 [Shivaram Venkataraman] Build SparkR with Maven profile
      aae881b [Davies Liu] fix rat
      ff776aa [Shivaram Venkataraman] Fix style
      e4f1937 [Shivaram Venkataraman] Remove DFC example
      f7b6936 [Davies Liu] remove Spark prefix for class
      043959e [Davies Liu] cleanup
      ba53b09 [Davies Liu] support R in spark-submit
      f403b4a [Davies Liu] rm .travis.yml
      c4a5bdf [Davies Liu] run sparkr tests in Spark
      e8fc7ca [Davies Liu] fix .gitignore
      35e5755 [Davies Liu] reduce size of example data
      50bff63 [Davies Liu] add LICENSE header for R sources
      facb6e0 [Davies Liu] add .gitignore for .o, .so, .Rd
      18e5eed [Davies Liu] update docs
      0a0e632 [Davies Liu] move sparkR into bin/
      a76472f [Davies Liu] fix path of assembly jar
      df3eeea [Davies Liu] move R/examples into examples/src/main/r
      3415cc7 [Davies Liu] move Scala source into core/ and sql/
      180fc9c [Davies Liu] move scala
      014d253 [Davies Liu] delete man pages
      49a8133 [Davies Liu] Merge branch 'remote_r' into R
      44994c2 [Davies Liu] Moved files to R/
      2fc553f [Shivaram Venkataraman] Merge pull request #222 from davies/column2
      b043876 [Davies Liu] fix test
      5e610cb [Davies Liu] add more API for Column
      6f95d49 [Shivaram Venkataraman] Merge pull request #221 from shivaram/sparkr-stop-start
      3214c6d [Shivaram Venkataraman] Merge pull request #217 from hlin09/cleanClosureFix
      f5d3355 [Shivaram Venkataraman] Merge pull request #218 from davies/merge
      70f620c [Davies Liu] address comments
      4b1628d [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into merge
      3139325 [Shivaram Venkataraman] Merge pull request #212 from davies/toDF
      6122e0e [Davies Liu] handle NULL
      bc2ff38 [Davies Liu] handle NULL
      7f5e70c [Davies Liu] Update SerDe.scala
      46454e4 [Davies Liu] address comments
      dd52cbc [Shivaram Venkataraman] Merge pull request #220 from shivaram/sparkr-utils-include
      662938a [Shivaram Venkataraman] Include utils before SparkR for `head` to work Before this change calling `head` on a DataFrame would not work from the sparkR script as utils would be loaded after SparkR and placed ahead in the search list. This change requires utils to be loaded before SparkR
      1bc2998 [Shivaram Venkataraman] Merge pull request #179 from evertlammerts/sparkr-sql
      7695d36 [Evert Lammerts] added tests
      8190127 [Evert Lammerts] fixed parquetFile signature
      d8c8fcc [Shivaram Venkataraman] Merge pull request #219 from shivaram/sparkr-build-final
      963c7ee [Davies Liu] Merge branch 'master' into merge
      8bff523 [Shivaram Venkataraman] Remove staging repo now that 1.3 is released
      e52258f [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into toDF
      05b9126 [Shivaram Venkataraman] Merge pull request #215 from davies/agg
      8e1497d [Davies Liu] Update DataFrame.R
      72adb14 [Davies Liu] Update SQLContext.R
      66cc92a [Davies Liu] address commets
      55c38bc [Shivaram Venkataraman] Merge pull request #216 from davies/select2
      3e0555d [Shivaram Venkataraman] Merge pull request #193 from davies/daemon
      0467474 [Davies Liu] add more selecter for DataFrame
      9a6be74 [Davies Liu] include grouping columns in agg()
      e87bb98 [Davies Liu] improve comment and logging
      a6dc435 [Davies Liu] remove dependency of jsonlite
      26a3621 [Davies Liu] support date.frame and Date/Time
      4e4908a [Davies Liu] createDataFrame from rdd
      5757b95 [Shivaram Venkataraman] Merge pull request #196 from davies/die
      90f2692 [Shivaram Venkataraman] Merge pull request #211 from hlin09/generics
      8583968 [Davies Liu] readFully()
      46cea3d [Davies Liu] retry
      01aa5ee [Davies Liu] add config for using daemon, refactor
      ff948db [hlin09] Remove missingOrInteger.
      ecdfda1 [hlin09] Remove duplication.
      411b751 [Davies Liu] make RStudio happy
      8f8813f [Davies Liu] switch back to use parallel
      6bccbbf [hlin09] Move roxygen doc back to implementation.
      ffd6e8e [Shivaram Venkataraman] Merge pull request #210 from hlin09/hlin09
      471c794 [hlin09] Move getJRDD and broadcast's value to 00-generic.R.
      89b886d [hlin09] Move setGeneric() to 00-generics.R.
      97dde1a [hlin09] Add a test for access operators.
      09ff163 [Shivaram Venkataraman] Merge pull request #204 from cafreeman/sparkr-sql
      15a713f [cafreeman] Fix example for `dropTempTable`
      dc1291b [hlin09] Add checks for namespace access operators in cleanClosure.
      b4c0b2e [Davies Liu] use fork package
      3db5649 [cafreeman] Merge branch 'sparkr-sql' of https://github.com/amplab-extras/SparkR-pkg into sparkr-sql
      789be97 [Shivaram Venkataraman] Merge pull request #207 from shivaram/err-remove
      e60578a [cafreeman] update tests to guarantee row order
      5eec6fc [Shivaram Venkataraman] Merge pull request #206 from sun-rui/SPARKR-156_2
      3f7aed6 [Sun Rui] Fix minor typos in the function description.
      a8cebf0 [Shivaram Venkataraman] Remove print statement in SparkRBackendHandler This print statement is noisy for SQL methods which have multiple APIs (like loadDF). We already have a better error message when no valid methods are found
      5e3a576 [Sun Rui] Fix indentation.
      f3d99a6 [Sun Rui] [SPARKR-156] phase 2: implement zipWithIndex() of the RDD class.
      a582810 [cafreeman] Merge branch 'dfMethods' into sparkr-sql
      7a5d6fd [cafreeman] `withColumn` and `withColumnRenamed`
      c5fa3b9 [cafreeman] New `select` method
      bcb0bf5 [Shivaram Venkataraman] Merge pull request #180 from davies/group
      9dd6a5a [Davies Liu] Update SparkRBackendHandler.scala
      e6fb8d8 [Davies Liu] improve logging
      428a99a [Davies Liu] remove test, catch exception
      fef99de [cafreeman] `intersect`, `subtract`, `unionAll`
      befbd32 [cafreeman] `insertInto`
      9d01bcd [cafreeman] `dropTempTable`
      d8c1c09 [Davies Liu] add test to start and stop context multiple times
      18c6004 [Shivaram Venkataraman] Merge pull request #201 from sun-rui/SPARKR-156_1
      dfb399a [Davies Liu] address comments
      f06ccec [Sun Rui] Use mapply() instead of for statement.
      3c7674f [Davies Liu] Merge branch 'die' of github.com:davies/SparkR-pkg into die
      ac8a852 [Davies Liu] close monitor connection in sparkR.stop()
      4d0fb56 [Shivaram Venkataraman] Merge pull request #203 from shivaram/sparkr-hive-fix
      62b0760 [Shivaram Venkataraman] Fix test hive context package name
      47a613f [Shivaram Venkataraman] Fix HiveContext package name
      fb3b139 [Davies Liu] fix tests
      d0d4626 [Shivaram Venkataraman] Merge pull request #199 from davies/load
      8b7fb67 [Davies Liu] fix HiveContext
      bb46832 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into load
      e9e2a03 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into group
      b875b4f [Davies Liu] fix style
      de2abfa [Shivaram Venkataraman] Merge pull request #202 from cafreeman/sparkr-sql
      3675fcf [cafreeman] Update `explain` and fixed doc for `toJSON`
      5fd9575 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into load
      6fac596 [Davies Liu] support Column expression in agg()
      f10a24e [Davies Liu] address comments
      ff8b005 [cafreeman] 'saveAsParquetFile`
      a5c2887 [cafreeman] fix test
      3fab0f8 [cafreeman] `showDF`
      779c102 [cafreeman] `isLocal`
      68b11cf [cafreeman] `toJSON`
      0ac4abc [cafreeman] 'explain`
      20242c4 [cafreeman] clean up docs
      6a1fe64 [Shivaram Venkataraman] Merge pull request #198 from cafreeman/sparkr-sql
      198c130 [Shivaram Venkataraman] Merge pull request #200 from shivaram/sparkr-sql-build
      870acd4 [Shivaram Venkataraman] Use rc2 explicitly
      8b9a963 [cafreeman] Merge branch 'sparkr-sql' of https://github.com/amplab-extras/SparkR-pkg into sparkr-sql
      bc90115 [cafreeman] Fixed docs
      3865f39 [Sun Rui] [SPARKR-156] phase 1: implement zipWithUniqueId() of the RDD class.
      a37fd80 [Davies Liu] Update sparkR.R
      d18f9d3 [Shivaram Venkataraman] Remove SparkR snapshot build We now have 1.3.0 RC2 on Apache Staging
      8de958d [Davies Liu] Update SparkRBackend.scala
      4e0becc [Shivaram Venkataraman] Merge pull request #194 from davies/api
      197a79b [Davies Liu] add HiveContext (commented)
      32aa01d [Shivaram Venkataraman] Merge pull request #191 from felixcheung/doc
      5073e07 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into load
      7918634 [cafreeman] Fix test
      acea146 [cafreeman] remove extra line
      74269f3 [cafreeman] Merge branch 'dfMethods' into sparkr-sql
      cd7ac8a [Shivaram Venkataraman] Merge pull request #197 from cafreeman/sparkr-sql
      494a4dd [cafreeman] update export
      e14c328 [cafreeman] `selectExpr`
      32b37d1 [cafreeman] Fixed indent in `join` test.
      2e7b190 [Felix Cheung] small update on yarn deploy mode.
      8ff29d6 [Davies Liu] fix tests
      12a6db2 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into api
      294ca4a [cafreeman] `join`, `sort`, and `filter`
      4fa6343 [cafreeman] Refactor `join` generic for use with `DataFrame`
      3f22c8d [Shivaram Venkataraman] Merge pull request #195 from cafreeman/sparkr-sql
      2b6f980 [Davies Liu] shutdown the JVM after R process die
      e8639c3 [cafreeman] New 1.3 repo and updates to `column.R`
      ed9a89f [Davies Liu] address comments
      03bcf20 [Davies Liu] Merge branch 'group' of github.com:davies/SparkR-pkg into group
      39c253d [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into group
      98cc97a [Davies Liu] fix test and docs
      e2d144a [Felix Cheung] Fixed small typos
      3beadcf [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into api
      06cbc2d [Davies Liu] launch R worker by a daemon
      8a676b1 [Shivaram Venkataraman] Merge pull request #188 from davies/column
      524c122 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into column
      f798402 [Davies Liu] Update column.R
      1d0f2ae [Davies Liu] Update DataFrame.R
      03402eb [Felix Cheung] Updates as per feedback on sparkR-submit
      76cf2e0 [Shivaram Venkataraman] Merge pull request #192 from cafreeman/sparkr-sql
      1955a09 [cafreeman] return object instead of a list of one object
      f585929 [cafreeman] Fix brackets
      e998356 [cafreeman] define generic for 'first' in RDD API
      71d66a1 [Davies Liu] fix first(0
      8ec21af [Davies Liu] fix signature
      acae527 [Davies Liu] refactor
      d7b17a4 [Davies Liu] fix approxCountDistinct
      7dfe27d [Davies Liu] fix cyclic namespace dependency
      8caf5bb [Davies Liu] use S4 methods
      5c0bb24 [Felix Cheung] Doc updates: build and running on YARN
      773baf0 [Zongheng Yang] Merge pull request #178 from davies/random
      862f07c [Shivaram Venkataraman] Merge pull request #190 from shivaram/SPARKR-79
      b457833 [Shivaram Venkataraman] Merge pull request #189 from shivaram/stdErrFix
      f7caeb8 [Davies Liu] Update SparkRBackend.scala
      8c4deae [Shivaram Venkataraman] Remove unused function
      6e51c7f [Shivaram Venkataraman] Fix stderr redirection on executors
      7afa4c9 [Shivaram Venkataraman] Merge pull request #186 from hlin09/funcDep3
      4d36ab1 [hlin09] Add tests for broadcast variables.
      3f57e56 [hlin09] Fix comments.
      7b72487 [hlin09] Fix comments.
      ae05bf1 [Davies Liu] Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into column
      abb4bb9 [Davies Liu] add Column and expression
      eb8ac11 [Shivaram Venkataraman] Set Spark version 1.3.0 in Windows build
      5c72e73 [Davies Liu] wait atmost 100 seconds
      e425437 [Shivaram Venkataraman] Merge pull request #177 from lythesia/master
      a00f502 [lythesia] fix indents
      0346e5f [Davies Liu] address comment
      6134649 [Shivaram Venkataraman] Merge pull request #187 from cafreeman/sparkr-sql
      ad0935e [lythesia] minor fixes
      b0e7f73 [cafreeman] Update `sampleDF` test
      7b0d070 [lythesia] keep partitions check
      889c265 [cafreeman] numToInt utility function
      27dd3a0 [lythesia] modify tests for repartition
      cad0f0c [cafreeman] Fix docs and indents
      2808dcf [cafreeman] Three more DataFrame methods
      5ef66fb [Davies Liu] send back the port via temporary file
      3b46429 [Davies Liu] Merge branch 'master' of github.com:amplab-extras/SparkR-pkg into random
      798f453 [cafreeman] Merge branch 'sparkr-sql' into dev
      9aa4acf [Shivaram Venkataraman] Merge pull request #184 from davies/socket
      020bce8 [Shivaram Venkataraman] Merge pull request #183 from cafreeman/sparkr-sql
      222e06b [cafreeman] Lazy evaluation and formatting changes
      e776324 [Davies Liu] fix import
      211cc15 [cafreeman] Merge branch 'sparkr-sql' into dev
      3351afd [hlin09] Replaces getDependencies with cleanClosure, to serialize UDFs to workers.
      e7c56d6 [lythesia] fix random partition key
      50c74b1 [Davies Liu] address comments
      083c89f [cafreeman] Remove commented lines an unused import
      dfa119b [hlin09] Improve the coverage of processClosure.
      a41c9b9 [cafreeman] Merge branch 'wrapper' into sparkr-sql
      1cd714f [cafreeman] Wrapper function docs.
      db0cd9e [cafreeman] Clean up for wrapper functions
      818c19f [cafreeman] Update schema-related functions
      a57884e [cafreeman] Remove unused import
      d72e830 [cafreeman] Add wrapper for `StructField` and `StructType`
      2ea2ecf [lythesia] use generic arg
      09b9512 [hlin09] add docs
      f4f077c [hlin09] Add recursive cleanClosure for function access.
      f84ad27 [hlin09] Merge remote-tracking branch 'upstream/master' into funcDep2
      5300766 [Shivaram Venkataraman] Merge pull request #185 from hlin09/hlin09
      07aa7c0 [hlin09] Unifies the implementation of lapply with lapplyParitionsWithIndex.
      f4dbb0b [Davies Liu] use socket in worker
      8282c59 [Davies Liu] Update DataFrame.R
      ba495a8 [Davies Liu] Update NAMESPACE
      36dffb3 [cafreeman] Add 'head` and `first`
      534a95f [cafreeman] Schema-related methods
      64f488d [cafreeman] Cache and Persist Methods
      30d71fd [cafreeman] Standardize method arguments for DataFrame methods
      785898b [Shivaram Venkataraman] Merge pull request #182 from cafreeman/sparkr-sql
      2619003 [Shivaram Venkataraman] Merge pull request #181 from cafreeman/master
      a9bbe0b [cafreeman] Update existing SparkSQL functions
      8c241a3 [cafreeman] Merge with master, include changes to method args
      68d6de4 [cafreeman] Fix typos
      8d2ec6e [Davies Liu] add sum/max/min/avg/mean
      774e687 [Davies Liu] add missing API in SQLContext
      1e72b4b [Davies Liu] missing API in SQLContext
      3294949 [Chris Freeman] Restore `rdd` argument to `getJRDD`
      3a58ebc [Davies Liu] rm unrelated file
      8bd93b5 [Davies Liu] fix signature
      c652b4c [cafreeman] Update method signatures to use generic arg
      48c8827 [Davies Liu] update NAMESPACE
      84e2d8c [Davies Liu] groupBy and agg()
      7c3ddbd [Davies Liu] create jmode in JVM
      9465426 [Davies Liu] load and save
      982f342 [lythesia] fix numeric issue
      7651d84 [lythesia] fix coalesce
      4e712e1 [Davies Liu] use random port in backend
      041d22b [Shivaram Venkataraman] Merge pull request #172 from cafreeman/sparkr-sql
      0d07770 [cafreeman] Added `limit` and updated `take`
      301d8e5 [cafreeman] Remove extraneous map functions
      0387db2 [cafreeman] Remove colNames
      04c4b65 [lythesia] add repartition/coalesce
      231deab [cafreeman] Change reserialize to serializeToBytes
      acf7e1a [cafreeman] Rework the Scala to R DataFrame Conversion
      481ae37 [cafreeman] Updated stale comments and standardized arg names
      21d4a97 [hlin09] Adds cleanClosure to capture the function closures.
      d24ffb4 [hlin09] Merge remote-tracking branch 'upstream/master' into funcDep2
      8be02de [hlin09] Revert "loop 1-12 test pass."
      fddb9cc [hlin09] Revert "add docs"
      f8ef0ab [hlin09] Revert "More docs"
      8e4b3da [hlin09] Revert "More docs"
      57e005b [hlin09] Revert "fix tests."
      c10148e [Shivaram Venkataraman] Merge pull request #174 from shivaram/sparkr-runner
      910e3be [Shivaram Venkataraman] Add a timeout for initialization Also move sparkRBackend.stop into a finally block
      bf52b17 [Shivaram Venkataraman] Merge remote-tracking branch 'amplab-sparkr/master' into sparkr-runner
      08102b0 [Shivaram Venkataraman] Merge pull request #176 from lythesia/master
      9c77b20 [Chris Freeman] Merge pull request #2 from shivaram/sparkr-sql
      179ab38 [lythesia] add try counts and increase time interval
      71a73b2 [Shivaram Venkataraman] Use a getter for serialization mode This change encapsulates the semantics of serialization mode for RDDs inside a getter function. For PipelinedRDDs if a backing JavaRDD is available we use that else we fall back to a default serialization mode
      06bf250 [Shivaram Venkataraman] Merge pull request #173 from shivaram/windows-space-fix
      88bf97f [Shivaram Venkataraman] Create SparkContext for R shell launch
      f9268d9 [Shivaram Venkataraman] Fix code review comments
      e6ad12d [Shivaram Venkataraman] Update comment describing sparkR-submit
      17eda4c [Shivaram Venkataraman] Merge pull request #175 from falaki/docfix
      ba2b72b [Hossein] Spark 1.1.0 is default
      4cd7d3f [lythesia] retry backend connection
      749e2d0 [Hossein] Updated README
      bc04cf4 [Shivaram Venkataraman] Use SPARKR_BACKEND_PORT in sparkR.R as default Change SparkRRunner to use EXISTING_SPARKR_BACKEND_PORT to differentiate between the two
      22a19ac [Shivaram Venkataraman] Use a semaphore to wait for backend to initalize Also pick a random port to avoid collisions
      7f1f0f8 [cafreeman] Move comments to fit 100 char line length
      8b84e4e [cafreeman] Make if statements more explicit
      ce5d5ab [cafreeman] New tests for Union and Object File
      b063320 [cafreeman] Changed 'serialized' to 'serializedMode'
      0981dff [Zongheng Yang] Merge pull request #168 from sun-rui/SPARKR-153_2
      86fc639 [Shivaram Venkataraman] Move sparkR-submit into pkg/inst
      fd8f8a9 [Shivaram Venkataraman] Merge branch 'hqzizania-master'
      a33dbea [Shivaram Venkataraman] Merge branch 'master' of https://github.com/hqzizania/SparkR-pkg into hqzizania-master
      384e6e2 [Shivaram Venkataraman] Merge pull request #171 from hlin09/hlin09
      1f5a6ac [hlin09] fixed comments
      7f7596a [cafreeman] Additional handling for "row" serialization
      8c3b8c5 [cafreeman] Add test for UnionRDD on "row" serialization
      b1141f8 [cafreeman] Fixed formatting issues.
      5db30bf [cafreeman] Changed serialized from bool to string
      2f0c0b8 [cafreeman] Add check for serialized type
      d243dfb [cafreeman] Clean up code
      5ff63a2 [cafreeman] Change test from boolean to string
      77fec1a [cafreeman] Updated .Rd files
      9224989 [cafreeman] Various updates for DataFrame to RRDD
      26af62b [cafreeman] DataFrame to RRDD
      e004481 [cafreeman] Update UnionRDD test
      5292be7 [hlin09] Adds support of pipeRDD().
      e2a7560 [Shivaram Venkataraman] Merge pull request #170 from cafreeman/sparkr-sql
      5d537f4 [cafreeman] Add pairRDD to Description
      b6fa88e [cafreeman] Updating to current master
      0cda231 [Sun Rui] [SPARKR-153] phase 2: implement aggregateByKey() and foldByKey().
      95ee6b4 [Shivaram Venkataraman] Merge remote-tracking branch 'amplab-sparkr/master' into sparkr-runner
      67fbc60 [Shivaram Venkataraman] Add support for SparkR shell to use spark-submit This ensures that SparkConf options are read in both in batch and interactive modes
      2271030 [Shivaram Venkataraman] Merge pull request #167 from sun-rui/removePartionByInRDD
      7fcb46a [Sun Rui] Remove partitionBy() in RDD.
      52f94c4 [Shivaram Venkataraman] Merge pull request #160 from lythesia/master
      59e2d54 [lythesia] merge with upstream
      5836650 [Zongheng Yang] Merge pull request #163 from sun-rui/SPARKR-153_1
      141723e [Sun Rui] fix comments.
      f73a07e [Shivaram Venkataraman] Merge pull request #165 from shivaram/sparkr-sql-build
      10ffc6d [Shivaram Venkataraman] Set Spark version to 1.3 using staging dependency Also fix the maven build
      c91ede2 [Shivaram Venkataraman] Merge pull request #164 from hlin09/hlin09
      9d335a9 [hlin09] Makes git to ignore Eclipse meta files.
      94066bf [Sun Rui] [SPARKR-153] phase 1: implement fold() and aggregate().
      9c391c7 [hqzizania] Merge remote-tracking branch 'upstream/master'
      5f29551 [hqzizania] 	modified:   pkg/R/RDD.R 	modified:   pkg/R/context.R
      d968664 [lythesia] fix comment
      7972858 [Shivaram Venkataraman] Merge pull request #159 from sun-rui/SPARKR-150_2
      7690878 [lythesia] separate out pair RDD functions
      f4573c1 [Sun Rui] Use reduce() instead of sortBy().take() to get the ordered elements.
      63e62ed [Sun Rui] [SPARKR-150] phase 2: implement takeOrdered() and top().
      050390b [Shivaram Venkataraman] Fix bugs in inferring R file
      8398f2e [Shivaram Venkataraman] Add sparkR-submit helper script Also adjust R file path for YARN cluster mode
      bd6705b [Zongheng Yang] Merge pull request #154 from sun-rui/SPARKR-150
      c7964c9 [Sun Rui] Merge with upstream master.
      7feac38 [Sun Rui] Use default arguments for sortBy() and sortKeyBy().
      de2bfb3 [Sun Rui] Fix minor comments and add more test cases.
      0c6e071 [Zongheng Yang] Merge pull request #157 from lythesia/master
      f5038c0 [lythesia] pull out anonymous functions in groupByKey
      ba6f044 [lythesia] fixes for reduceByKeyLocally
      343b6ab [Oscar Olmedo] Export sparkR.stop Closes #156 from oscaroboto/master
      25639cf [Shivaram Venkataraman] Replace tabs with spaces
      bb25920 [Shivaram Venkataraman] Merge branch 'dputler-master'
      fd836db [hlin09] fix tests.
      24a7f13 [hlin09] More docs
      a465165 [hlin09] More docs
      6ad4fc3 [hlin09] add docs
      b082a35 [lythesia] add reduceByKeyLocally
      7ca6512 [Shivaram Venkataraman] First cut of SparkRRunner
      193f5fe [hlin09] loop 1-12 test pass.
      345f1b8 [dputler] [SPARKR-195] Implemented project style guidelines for if-else statements
      8043559 [Sun Rui] Add a TODO to use binary search in the range partitioner.
      91b2fd6 [Sun Rui] Add more test cases.
      e8ebbe4 [Shivaram Venkataraman] Merge pull request #152 from cafreeman/sparkr-sql
      0c53d6c [dputler] Data frames now coerced to lists, and messages issued for a data frame or matrix on how they are parallelized
      6d57ec0 [cafreeman] Remove json test file since we're using a temp
      ac1ef09 [cafreeman] Update registerTempTable test
      d9da451 [Sun Rui] [SPARKR-150] phase 1: implement sortBy() and sortByKey().
      08ff30b [Shivaram Venkataraman] Merge pull request #153 from hqzizania/master
      9767e8e [hqzizania] 	modified:   pkg/man/collect-methods.Rd
      5d69f0a [hqzizania] 	modified:   pkg/R/RDD.R
      4914091 [hqzizania] 	modified:   pkg/inst/tests/test_rdd.R
      742a68b [cafreeman] Update test_sparkRSQL.R
      a95823e [hqzizania] 	modified:   pkg/R/RDD.R
      2d04526 [cafreeman] Formatting
      fae9bdd [cafreeman] Renamed to SQLUtils.scala
      39888ea [Chris Freeman] Update test_sparkSQL.R
      fce2453 [cafreeman] Updated documentation for SQLContext
      13fbf12 [cafreeman] Regenerated .Rd files
      51ecf41 [cafreeman] Updated Scala object
      30d7337 [cafreeman] Added SparkSQL test
      74b3ed6 [cafreeman] Incorporate code feedback
      554bda0 [Zongheng Yang] Merge pull request #147 from shivaram/sparkr-ec2-fixes
      a5f4f8f [cafreeman] Squashed commit of the following:
      f34bb88 [Shivaram Venkataraman] Remove profiling information from this PR
      c662f29 [Zongheng Yang] Merge pull request #146 from shivaram/spark-1.2-build
      21e9b74 [Zongheng Yang] Merge pull request #145 from lythesia/master
      76f6b9e [Shivaram Venkataraman] Merge pull request #149 from hqzizania/master
      1c2dbec [lythesia] minor fix for refactoring join code
      5b380d3 [hqzizania] 	modified:   pkg/man/combineByKey.Rd 	modified:   pkg/man/groupByKey.Rd 	modified:   pkg/man/partitionBy.Rd 	modified:   pkg/man/reduceByKey.Rd
      98794fe [hqzizania] 	modified:   pkg/R/RDD.R
      b66534d [Zongheng Yang] Merge pull request #144 from shivaram/fix-rd-files
      60da1df [Shivaram Venkataraman] Initialize timing variables
      179aa75 [Shivaram Venkataraman] Bunch of fixes for longer running jobs 1. Increase the timeout for socket connection to wait for long jobs 2. Add some profiling information in worker.R 3. Put temp file writes before stdin writes in RRDD.scala
      06d99f0 [Shivaram Venkataraman] Fix URI to have right number of slashes
      add97f5 [Shivaram Venkataraman] Use URL encode to create valid URIs for jars
      4eec962 [lythesia] refactor join functions
      73430c6 [Shivaram Venkataraman] Make SparkR work on paths with spaces on Windows
      aaf8f47 [Shivaram Venkataraman] Exclude hadoop client from Spark dependency
      227ee42 [Zongheng Yang] Merge pull request #141 from shivaram/SPARKR-140
      ac5ceb1 [Shivaram Venkataraman] Fix code review comments
      32394de [Shivaram Venkataraman] Regenerate Rd files for SparkR This fixes a number of issues in SparkR man pages. The main changes are 1. Don't export or generate docs for PipelineRDD 2. Fix variable names for Filter, count to match base methods 3. Document missing arguments for sparkR.init, print.jobj etc.
      e157bf6 [Shivaram Venkataraman] Use prev_serialized to track if JRDD is serialized This changes introduces a new variable in PipelineRDD environment to track if the prev_jrdd is serialized or not.
      7428a7e [Zongheng Yang] Merge pull request #143 from shivaram/SPARKR-181
      7dd1797 [Shivaram Venkataraman] Address code review comments
      8f81c45 [Shivaram Venkataraman] Remove roxygen export for PipelinedRDD
      0cb90f1 [Zongheng Yang] Merge pull request #142 from shivaram/SPARKR-169
      d1c6e6c [Shivaram Venkataraman] Buffer stderr from R and return it on Exception This change buffers the last 100 lines from R process and passes these lines back to the driver if we have an exception. This will help users debug why their tasks failed on the cluster
      d6c1393 [Shivaram Venkataraman] Suppress warnings from normalizePath
      a382835 [Shivaram Venkataraman] Fix serialization tracking in pipelined RDDs When creating a pipeline RDD, we need to check if the JavaRDD belonging to the parent is serialized.
      da39529 [Zongheng Yang] Merge pull request #140 from sun-rui/SPARKR-183
      2814caa [Sun Rui] Merge with upstream master.
      cd2a5b3 [Sun Rui] Add reference to Nagle's algorithm and clean code.
      52356b6 [Shivaram Venkataraman] Merge pull request #139 from shivaram/fix-backend-exit
      97e5a1f [Sun Rui] [SPARKR-183] Fix the issue that parallelize collect tests are slow.
      a9f8e8e [Shivaram Venkataraman] Merge pull request #138 from concretevitamin/fix-collect-test
      125ae43 [Shivaram Venkataraman] Fix SparkR backend to exit in more cases This change has two fixes 1. When the workspace is saved (from R or RStudio) the backend connection seems to be closed before the finalizer is run. In such cases we reopen the connection and stop the backend 2. With RStudio when R is restarted, there are port-conflicts which appear due to a race condition between the JVM and rsession restart. This change adds a 1 sec sleep to avoid this race.
      12c102a [Zongheng Yang] Simplify a unit test.
      9c0637a [Zongheng Yang] Merge pull request #137 from shivaram/fix-docs
      0df0e18 [Shivaram Venkataraman] Fix documentation for includePackage
      7549f88 [Zongheng Yang] Merge pull request #136 from shivaram/man-updates
      7edbe46 [Shivaram Venkataraman] Add missing man pages
      9cb9567 [Shivaram Venkataraman] Merge pull request #131 from shivaram/rJavaExpt
      1fa722e [Shivaram Venkataraman] Rename to SerDe now
      2fcb051 [Shivaram Venkataraman] Rename to SerDeJVMR
      d112cf0 [Shivaram Venkataraman] Style fixes
      9fd01cc [Shivaram Venkataraman] Remove unnecessary braces
      0881931 [Shivaram Venkataraman] Some more style fixes
      f00b531 [Shivaram Venkataraman] Address code review comments. Big changes include style fixes throughout for named arguments
      c09ba05 [Shivaram Venkataraman] Change jobj id to be just an integer Add a new print.jobj that gets the class name and prints it Also add a utility function isInstanceOf
      be05b16 [Shivaram Venkataraman] Check if context, connection exist before stopping
      d596a23 [Shivaram Venkataraman] Address code review comments
      396e7ac [Shivaram Venkataraman] Changes to make new backend work on Windows This change uses file.path to construct the Java binary path in a OS agnostic way and uses system2 to handle quoting binary paths correctly. Tests pass on Mac OSX and a Windows EC2 instance.
      e7a4e03 [Shivaram Venkataraman] Remove unused file BACKEND.md
      62f380b [Shivaram Venkataraman] Update worker.R to use new deserialization call
      8b9c4e6 [Shivaram Venkataraman] Change RDD name, setName to use new backend
      6dcd5c5 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/amplab-extras/SparkR-pkg into rJavaExpt
      0873397 [Shivaram Venkataraman] Refactor java object tracking into a new singleton. Also add comments describing each class
      95db964 [Shivaram Venkataraman] Add comments, cleanup new R code
      bcd4258 [Zongheng Yang] Merge pull request #130 from lythesia/master
      74dbc5e [Sun Rui] Match method using parameter types.
      7ad4a4d [Sun Rui] Use 1 char to represent types on the backend->client direction.
      bace887 [Sun Rui] Use an integer count for the backend java object ID because Uniqueness isn't guaranteed by System.identityHashCode().
      b38d04f [Sun Rui] Use 1 char to represent types on the client -> backend direction.
      f88bc68 [lythesia] Merge branch 'master' of github.com:lythesia/SparkR-pkg
      71d41f5 [lythesia] add test case for fullOuterJoin
      eb4f423 [lythesia] --amend
      cffecc5 [lythesia] add test case for fullOuterJoin
      a547dd2 [Shivaram Venkataraman] Move classTag, rddRef into newJObject call This avoids them getting eagerly garbage collected
      1255391 [Shivaram Venkataraman] Add a finalizer for jobj objects This enables Java objects to be garbage collected on the backend when they are no longer referenced in R. Also rename newJava to newJObject to be more consistent with callJMethod
      70fa409 [Sun Rui] Add YARN Conf Dir to the class path when launching the backend.
      a1108ca [lythesia] add fullOuterJoin in RDD.R
      2152727 [Shivaram Venkataraman] Remove empty file
      cd08bee [Shivaram Venkataraman] Update all functions to use new backend All unit tests pass.
      9de49b7 [Shivaram Venkataraman] Add high level calls for methods, constructors Also update BACKEND.md
      5a97ea4 [Shivaram Venkataraman] Add jobj S3 class that holds backend refs
      e071d3e [Shivaram Venkataraman] Change SparkRBackend to use general method calls This change uses a custom protocl + JNI to invoke any method on a given object type. Also update serializers, deserializers to make code more concise
      49f0404 [Shivaram Venkataraman] Merge pull request #129 from lythesia/master
      7f8cd82 [lythesia] update man
      4715ed2 [Yi Lu] Update RDD.R
      5a53801 [lythesia] fix name,setName
      4f3870b [lythesia] add name,setName in RDD.R
      1c25700 [Shivaram Venkataraman] Merge pull request #128 from sun-rui/SPARKR-165
      c8507d8 [Sun Rui] [SPARKR-165] IS_SCALAR is not present in R before 3.1
      2cff2bd [Sun Rui] Add function to invoke Java method.
      7a31da1 [Shivaram Venkataraman] Merge branch 'dputler-master'. Closes #119
      0ceba82 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/dputler/SparkR-pkg into dputler-master
      735f70c [Shivaram Venkataraman] Merge pull request #125 from 7c00/rawcon
      fccfe6c [Shivaram Venkataraman] Merge pull request #127 from sun-rui/SPARKR-164
      387bd57 [Sun Rui] [SPARKR-164] Temporary files used by SparkR accumulat as time goes on.
      5f2268f [Shivaram Venkataraman] Add support to stop backend
      5f745c0 [Shivaram Venkataraman] Update notes in backend
      22015c1 [Shivaram Venkataraman] Add first cut of SparkR Backend
      52821da [Todd Gao] switch the order of packages and function deps
      d7b0007 [Todd Gao] remove memCompress
      cb6873e [Shivaram Venkataraman] Merge pull request #126 from sun-rui/SPARKR-147
      c5962eb [Todd Gao] further optimize using rawConnection
      f04c6e0 [Sun Rui] [SPARKR-147] Support multiple directories as input to textFile.
      b7de604 [Todd Gao] optimize execFunctionDeps loading in worker.R
      4d4fc30 [Shivaram Venkataraman] Merge pull request #122 from cafreeman/master
      b508877 [cafreeman] Update SparkR_IDE_Setup.sh
      21ed9d7 [cafreeman] Update build.sbt
      f73ec16 [cafreeman] Delete SparkR_IDE_Setup_Guide.md
      d63b026 [cafreeman] Delete SparkR_Quick_Start_Guide.md
      6e6cb62 [cafreeman] Update SparkR_IDE_Setup.sh
      bc6042b [cafreeman] Update build.sbt
      a8197d5 [cafreeman] Merge remote-tracking branch 'upstream/master'
      d671564 [Zongheng Yang] Merge pull request #123 from shivaram/jcheck-void
      76b8d00 [Zongheng Yang] Merge pull request #124 from shivaram/master
      b690d58 [Shivaram Venkataraman] Specify how to change Spark versions in README
      0fb003d [Shivaram Venkataraman] Merge branch 'master' of https://github.com/amplab-extras/SparkR-pkg into jcheck-void
      1c227b4 [Shivaram Venkataraman] Also add a check in context.R
      96812b6 [Shivaram Venkataraman] Check for exceptions after void method calls
      f5c216d [cafreeman] Merge remote-tracking branch 'upstream/master'
      90c8933 [Zongheng Yang] Merge pull request #121 from shivaram/fix-sort-order
      bd0e3b4 [Shivaram Venkataraman] Fix saveAsTextFile test case
      2e55f67 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/amplab-extras/SparkR-pkg into fix-sort-order
      f10c607 [Shivaram Venkataraman] Merge pull request #118 from sun-rui/saveAsTextFile
      6c9bfc0 [Sun Rui] Merge remote-tracking branch 'SparkR_upstream/master' into saveAsTextFile
      6faedbe [cafreeman] Update SparkR_IDE_Setup_Guide.md
      57008bc [cafreeman] Update SparkR_IDE_Setup.sh
      bb1c17d [cafreeman] Update SparkR_IDE_Setup.sh
      538bfdb [cafreeman] Update SparkR_Quick_Start_Guide.md
      31322c6 [cafreeman] Update SparkR_IDE_Setup.sh
      ca3f593 [Sun Rui] Refactor RRDD code.
      df58d95 [cafreeman] Update SparkR_Quick_Start_Guide.md
      b488c88 [cafreeman] Rename Spark_IDE_Setup.sh to SparkR_IDE_Setup.sh
      b2545a4 [cafreeman] Added IDE Setup Guide
      0ffb5de [cafreeman] Merge branch 'master' of https://github.com/cafreeman/SparkR-pkg
      bd8fbfb [cafreeman] Merge remote-tracking branch 'upstream/master'
      98efa5b [cafreeman] Added Quick Start Guide
      3cf88f2 [Shivaram Venkataraman] Sort lists before comparing in unit tests Since Spark doesn't guarantee that shuffle results will always be in the same order, we need to sort the results before comparing for deterministic behavior
      d621dbc [Shivaram Venkataraman] Merge pull request #120 from sun-rui/objectFile
      c4a44d7 [Sun Rui] Add @seealso in comments and extract some common code into a function.
      724e3a4 [cafreeman] Update Spark_IDE_Setup.sh
      8153e5a [Sun Rui] [SPARKR-146] Support read/save object files in SparkR.
      17f9909 [cafreeman] Update Spark_IDE_Setup.sh
      a9eb080 [cafreeman] IDE Shell Script
      64d800c [dputler] Merge remote branch 'upstream/master'
      1fbdb2e [dputler] Added the ability for the user to specify a text file location throught the use of tilde expansion or just the file name if it is in the working directory.
      d83c017 [Shivaram Venkataraman] Merge pull request #113 from sun-rui/stringHashCodeInC
      a7d9cdb [Sun Rui] Fix build on Windows.
      7d81b05 [Shivaram Venkataraman] Merge pull request #114 from hlin09/hlin09
      47c4bb7 [hlin09] fix reviews
      a457f7f [Shivaram Venkataraman] Merge pull request #116 from dputler/master
      0fa48d1 [Shivaram Venkataraman] Merge pull request #117 from sun-rui/keyBy
      85cfeb4 [Sun Rui] [SPARKR-144] Implement saveAsTextFile() in the RDD class.
      09083d9 [Sun Rui] Add keyBy() to the RDD class.
      caad5d7 [dputler] Adding the script to install software on the Cloudera Quick Start VM.
      dca3d05 [hlin09] Minor fix.
      ece5f7d [hlin09] Merge remote-tracking branch 'upstream/master' into hlin09
      a40874b [hlin09] Use extendible accumulators aggregate the cogroup values.
      d0347ce [Zongheng Yang] Merge pull request #112 from sun-rui/outer_join
      492f76e [Sun Rui] Refine code and add description.
      ba01358 [Shivaram Venkataraman] Merge pull request #115 from sun-rui/SPARKR-130
      5c8e46e [Sun Rui] Fix per the review comments.
      7190a2c [Sun Rui] Update comment to add a reference to storage levels.
      1da705e [hlin09] Fix the review comments.
      c4b77be [Sun Rui] [SPARKR-130] Add persist(storageLevel) API to RDD.
      b424a1a [hlin09] Add function cogroup().
      9770312 [Shivaram Venkataraman] Merge pull request #111 from hlin09/hlin09
      cead7df [hlin09] fix review comments.
      54f712e [Sun Rui] Implement string hash code in C.
      425f0c6 [Sun Rui] Add leftOuterJoin() and rightOuterJoin() to the RDD class.
      39509c7 [hlin09] add Rd file for foreach and foreachPartition.
      63d6ac7 [hlin09] Adds function foreach() and foreachPartition().
      9c954df [Zongheng Yang] Merge pull request #105 from sun-rui/join
      c71228d [Sun Rui] Pre-allocate list with fixed length. Add test case for join() using string key.
      bc3e9f6 [Shivaram Venkataraman] Merge pull request #108 from concretevitamin/take-optimize
      c06fc90 [Zongheng Yang] Fix: only optimize for unserialized dataset case.
      d399aeb [Zongheng Yang] Apply size-capping on logical representation instead of physical.
      e4217dd [Zongheng Yang] Merge pull request #107 from shivaram/master
      7952180 [Shivaram Venkataraman] Copy, use getLocalDirs from Spark Utils.scala
      08e24c3 [Zongheng Yang] Merge pull request #109 from hlin09/hlin09
      97d4e02 [Zongheng Yang] Min() upper-bound size with actual size.
      bb779bf [hlin09] Rename the filter function to filterRDD to follow the API consistency. Filter() is also kept.
      ce1661f [Zongheng Yang] Fix slow take(): deserialize only up to necessary # of elements.
      4dca9b1 [Shivaram Venkataraman] Merge pull request #106 from hlin09/hlin09
      1220d92 [hlin09] Adds function numPartitions().
      2326a65 [Shivaram Venkataraman] Use SPARK_LOCAL_DIRS to create tmp files
      e119757 [hlin09] Minor fix.
      9c24c8b [hlin09] Adds function countByKey().
      48fce67 [hlin09] Adds countByValue().
      6679eef [Sun Rui] Update documentation for join().
      70586b4 [Sun Rui] Add join() to the RDD class.
      e6fb999 [Zongheng Yang] Merge pull request #103 from shivaram/rlibdir-fix
      a21f146 [Shivaram Venkataraman] Merge pull request #102 from hlin09/hlin09
      32eb619 [Shivaram Venkataraman] Merge pull request #104 from sun-rui/add_keys_values
      d8692e9 [Sun Rui] Add keys() and values() for the RDD class.
      18b9be1 [Shivaram Venkataraman] Allow users to set where SparkR is installed This also adds a warning if somebody tries to call sparkR.init multiple times.
      a17f135 [hlin09] Adds tests for flatMap and flatMapValues.
      4bcf59b [hlin09] Adds function flatMapValues.
      4a193ef [Zongheng Yang] Merge pull request #101 from ashutoshraina/master
      60d22f2 [Ashutosh Raina] changed sbt version
      5400793 [Zongheng Yang] Merge pull request #98 from shivaram/windows-fixes-build
      36d61a7 [Shivaram Venkataraman] Merge pull request #97 from hlin09/hlin09
      f7d7d89 [hlin09] Remove redundant code in test.
      6bbe823 [hlin09] minor style fix.
      9b47f3a [Shivaram Venkataraman] Merge pull request #100 from hnahak87/patch-1
      7f6e4ea [Harihar Nahak] Update logistic_regression.R
      a605047 [Shivaram Venkataraman] Merge pull request #99 from hlin09/makefile
      323151d [hlin09] Fix yar flag in Makefile to remove build error in Maven.
      8911897 [hlin09] Make reserialize() private function in package.
      79aee73 [Shivaram Venkataraman] Add notes on how to build SparkR on windows
      49a99e7 [Shivaram Venkataraman] Clean up some commented code
      ddc271b [Shivaram Venkataraman] Only append file:/// to non empty jar paths
      a53952e [Shivaram Venkataraman] Add windows build scripts
      325b179 [hlin09] Merge remote-tracking branch 'upstream/master' into hlin09
      daf5040 [hlin09] Add reserialize() before union if two RDDs are not both serialized.
      536afb1 [hlin09] Add new function of union().
      7044677 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/amplab-extras/SparkR-pkg into windows-fixes
      d22a02d [Zongheng Yang] Merge pull request #94 from shivaram/windows-fixes-stdin
      51924f7 [Shivaram Venkataraman] Merge pull request #90 from oscaroboto/master
      eb97d85 [Shivaram Venkataraman] Merge pull request #96 from sun-rui/add_clarification_readme
      5a128f4 [Sun Rui] Add clarification on setting Spark master when launching the SparkR shell.
      187526a [oscaroboto] Update sparkR.R
      32c567b [Shivaram Venkataraman] Merge pull request #95 from concretevitamin/master
      4cd2d5e [Zongheng Yang] Notes about spark-ec2.
      1c28e3b [Shivaram Venkataraman] Merge branch 'master' of https://github.com/amplab-extras/SparkR-pkg into windows-fixes
      8e8a029 [Zongheng Yang] Merge pull request #92 from shivaram/sparkr-yarn
      721043b [Zongheng Yang] Update README.md with YARN instructions.
      1681f58 [Shivaram Venkataraman] Use temporary files for input instead of stdin This fixes a bug for Windows where stdin would get truncated
      b084314 [oscaroboto] removed ... from example
      44c93d4 [oscaroboto] Added example to SparkR.R
      be82dcc [Shivaram Venkataraman] Merge pull request #93 from hlin09/hlin09
      868554d [oscaroboto] Update sparkR.R
      488ac47 [hlin09] Add generated Rd file of previous added functions, distinct() and mapValues().
      b2740ad [hlin09] Add test for filter all elements. Add filter() as alias.
      08d3631 [hlin09] Minor style fixes.
      2c0e34f [hlin09] Adds function Filter(), which extracts the elements that satisfy a predicate.
      5951d3b [Shivaram Venkataraman] Remove SBT plugin
      4e70ced [oscaroboto] changed ExecutorEnv to sparkExecutorEnvMap, to make it consistent with sparkEnvirMap
      903d18a [oscaroboto] changed executorEnv to sparkExecutorEnvMap,  will do the same in R
      f97346e [oscaroboto] executorEnv to lower-case e
      88a524e [oscaroboto] Added LD_LIBRARY_PATH to the ExecutorEnv. This is need so that the nodes can find libjvm.so, or if the master has a different LD_LIBRARY_PATH then the nodes. Make sure to export LD_LIBRARY_PATH  that includes the path to libjvm.so in the nodes.
      1d208ae [oscaroboto] added the YARN_CONF_DIR to the classpath
      8a9b75c [oscaroboto] forgot to change hm and ee inside the for loops
      579db58 [Shivaram Venkataraman] Merge pull request #91 from sun-rui/add_max_min
      4381efa [Sun Rui] use reduce() to implemement max() and min().
      a5459c5 [Shivaram Venkataraman] Consolidate yarn flags
      86b04eb [Shivaram Venkataraman] Don't use quotes around yarn
      bf0797f [Shivaram Venkataraman] Add dependency on spark yarn module
      af5fe77 [Shivaram Venkataraman] Fix SBT build, add dependency tree plugin
      4917607 [Sun Rui] Add maximum() and minimum() API to RDD.
      51bbbe4 [Shivaram Venkataraman] Changes to make SparkR work with YARN
      9d5e3ab [oscaroboto] a few stylistic changes. Also change vars to sparkEnvirMap and eevars to ExecutorEnv, to match sparkR.R
      578f545 [oscaroboto] a few stylistic changes
      39eea2f [oscaroboto] Modification to dynamically create a sparkContext with YARN. Added .setExecutorEnv to the sparkConf in createSparkContext within the RRDD object. This modification was made together with sparkR.R
      17ec42e [oscaroboto] A modification to dynamically create a sparkContext with YARN. sparkR.R modified to pass custom Jar file names and EnvironmentEnv to the sparkConf. RRDD.scala was also modified to accept the new inputs to creatSparkContext.
      624ac9d [Shivaram Venkataraman] Merge pull request #87 from sun-rui/SPARKR-125
      4f213db [Shivaram Venkataraman] Merge pull request #89 from sun-rui/SPARKR-108
      eb833c5 [Shivaram Venkataraman] Merge pull request #88 from hlin09/hlin09
      07bf971 [Sun Rui] [SPARKR-108] Implement map-side reduction for reduceByKey().
      4accba1 [hlin09] Fixes style and adds an optional param 'numPartition' in distinct().
      80d303a [hlin09] typo fixed.
      e37a9b5 [hlin09] Adds function distinct() and mapValues().
      08dac06 [Sun Rui] [SPARKR-125] Get the iterator of the parent RDD before launching a R worker process in compute() of RRDD/PairwiseRRDD
      c4ba53c [Shivaram Venkataraman] Merge pull request #85 from edwardt/master
      72a9d27 [root] reorder to keep relative ordering the same
      f3fcb10 [root] fix up build.sbt also to match pom.xml
      5ecbe3e [root] Make spark verison configurable in build script per ISSUE122
      a44e63d [Shivaram Venkataraman] Merge pull request #84 from sun-rui/SPARKR-94
      fbb5663 [Sun Rui] Add {} to one-line functions and add a test case for lookup where no match is found.
      95beb4e [Shivaram Venkataraman] Merge pull request #82 from edwardt/master
      36776c5 [edwardt] missed one 0.9.0 revert
      b26deec [Sun Rui] [SPARKR-94] Add a  method to get an element of a pair RDD object by key.
      1ba256e [edwardt] Keep 0.9.0 and says uses 1.1.0 by default
      5380c43 [root] missed one version
      21f74da [root] upgrade to spark version 1.1.0 to match lastest merge list
      ddfcde9 [root] merge
      67d067a [Shivaram Venkataraman] Merge pull request #81 from sun-rui/SparkR-117
      993868f [Sun Rui] [SPARKR-117] Update Spark dependency to 1.1.0
      d20661a [Zongheng Yang] Merge pull request #80 from sun-rui/master
      0b2da9f [Sun Rui] Update Rd file and add a test case for mapPartitions.
      5879648 [Sun Rui] Add mapPartitions() method to RDD for API consistency.
      c033461 [Shivaram Venkataraman] Merge pull request #79 from sun-rui/fix-kmeans
      f62b77e [Sun Rui] Adjust coding style.
      b40911d [Sun Rui] Fix syntax error in examples/kmeans.R.
      5304451 [Shivaram Venkataraman] Merge pull request #78 from sun-rui/master
      70ffbfb [Sun Rui] Fix a bug that modifications to build.sbt won't trigger rebuilding.
      a25696c [Shivaram Venkataraman] Merge pull request #76 from edwardt/addjira
      b8bbd93 [edwardt] Update README.md
      615d930 [edwardt] Update README.md
      e522e69 [edwardt] Update README.md
      03e6ced [edwardt] Update README.md
      3007015 [root] don't check in gedit buffer file'
      c35c9a6 [root] Add where to enter bugs ad feeback
      469eae3 [edwardt] Update README.md
      61b4a43 [edwardt] Update Makefile (style uniformity)
      ce3337d [edwardt] Update README.md
      7ff68fc [root] Merge branch 'master' of https://github.com/edwardt/SparkR-pkg
      16353f5 [root] add links to devtools and install_github
      513b9e5 [Shivaram Venkataraman] Merge pull request #72 from edwardt/master
      31608a4 [edwardt] Update Makefile (style uniformity)
      4ffe146 [root] Makefile: factor out SPARKR_VERSION to reduce potential copy&paste error; cp & rm called with -f in build/clean phase; .gitignore includes checkpoints and unit test log generated by run-tests.sh
      715275f [Zongheng Yang] Merge pull request #68 from shivaram/master
      90e2083 [Shivaram Venkataraman] Add return type to hasNext
      8eb983d [Shivaram Venkataraman] Fix up comment
      2206164 [Shivaram Venkataraman] Delete temporary files after they are read This change deletes temporary files used for communication between Rscript and the JVM once they have been completely read.
      5881da7 [Zongheng Yang] Merge pull request #67 from shivaram/improve-shuffle
      81251e2 [Shivaram Venkataraman] Address code review comments
      a5f573f [Shivaram Venkataraman] Use a better list append in shuffles This is helpful in scenarios where we have a large number of values in a bucket
      388e64d [Shivaram Venkataraman] Merge pull request #55 from RevolutionAnalytics/master
      e1f95b6 [Zongheng Yang] Merge pull request #65 from concretevitamin/parallelize-fix
      fc1a71a [Zongheng Yang] Fix that collect(parallelize(sc,1:72,15)) drops elements.
      b8204c5 [Zongheng Yang] Minor: update a URL in README.
      86f30c3 [Antonio Piccolboni] better fix for amplab-extras/SparkR-pkg#53
      b3c318d [Antonio Piccolboni] delayed loading to have all namespaces available.
      f323e97 [Antonio Piccolboni] tentative fix for amplab-extras/SparkR-pkg#53
      6f82269 [Zongheng Yang] Merge pull request #48 from shivaram/master
      8f433e5 [Shivaram Venkataraman] Move up Hadoop in pom.xml and add back protobufs As Hadoop 1.0.4 doesn't use protobufs, we can't exclude protobufs from Spark always. This change tries to order the dependencies so that the shader first picks up Hadoop's protobufs over Mesos.
      bfe7e26 [Shivaram Venkataraman] Merge pull request #36 from RevolutionAnalytics/vectorize-examples
      059ae41 [Antonio Piccolboni] and more formatting
      9dbd531 [Antonio Piccolboni] more formatting per committer request
      948738a [Antonio Piccolboni] converted tabs to spaces per project request
      49f5f5a [Shivaram Venkataraman] Merge pull request #35 from shivaram/master
      3eb5ad3 [Shivaram Venkataraman] on_failure -> after_failure in travis.yml
      139bdee [Shivaram Venkataraman] Cache sbt, maven, ivy dependencies
      4ebced2 [Shivaram Venkataraman] Merge pull request #34 from shivaram/master
      8437061 [Shivaram Venkataraman] Exclude protobuf from Spark dependency in Maven This avoids pulling in multiple versions of protobuf from Mesos and Hadoop.
      91aa527 [Antonio Piccolboni] vectorized version, 36s 10 slices 10^6 per slice. The older version takes 30 sec on 1/10th of data.
      f137a57 [Antonio Piccolboni] for rstudio users
      1f7ffb0 [Antonio Piccolboni] implemented using matrices and vectorized calls wherever possible
      46b23df [Antonio Piccolboni] replace require with library
      b15d7db [Antonio Piccolboni] faster parsing
      8b7aeb3 [Antonio Piccolboni] 22x speed improvement, 3X mem impovement
      c5bce07 [Zongheng Yang] Merge pull request #30 from shivaram/string-tests
      21fa2d8 [Shivaram Venkataraman] Fix bug where serialized was not changed for RRRD Reason: When an RRDD is created in getJRDD we have converted any possibly unserialized RDD to a serialized RDD.
      9d1ea20 [Shivaram Venkataraman] Merge branch 'master' of github.com:amplab/SparkR-pkg into string-tests
      7b9348c [Shivaram Venkataraman] Add tests for partition with string keys Add two tests one with a string array and one from a textFile to test both codepaths
      aacd726 [Shivaram Venkataraman] Update README with maven proxy instructions
      803e62c [Shivaram Venkataraman] Merge pull request #28 from concretevitamin/master
      7c093e6 [Zongheng Yang] Use inherits() to test an object's class.
      061c591 [Shivaram Venkataraman] Merge pull request #26 from hafen/master
      90f9fda [Ryan Hafen] Fix isRdd() to properly check for class
      5b10cc7 [Zongheng Yang] Merge pull request #24 from shivaram/master
      7014f83 [Shivaram Venkataraman] Remove unused transformers in maven's pom.xml
      b00cea5 [Shivaram Venkataraman] Add support for a Maven build
      11ec9b2 [Shivaram Venkataraman] Merge pull request #12 from concretevitamin/pipelined
      6b18a90 [Zongheng Yang] Merge branch 'master' into pipelined
      57127b8 [Zongheng Yang] Merge pull request #23 from shivaram/master
      1ac3940 [Zongheng Yang] Review feedback.
      a06fb34 [Zongheng Yang] Remove outdated comment.
      0a1fc13 [Shivaram Venkataraman] Fixes for using SparkR with Hadoop2. 1. Exclude ASM, Netty from Hadoop similar to Spark. 2. Concat services files to ensure HDFS filesystems work. 3. Update README with an example
      9a1db44 [Zongheng Yang] Merge pull request #22 from shivaram/master
      e462448 [Shivaram Venkataraman] Use `$` for calling `put` instead of .jrcall
      ed4559a [Shivaram Venkataraman] Add support for passing Spark environment vars This change creates a new `createSparkContext` method in RRDD as we can't pass Map<String, String> through rJava. Also use SPARK_MEM in local mode to increase heap size and update the README with some examples.
      10228fb [Shivaram Venkataraman] Merge pull request #20 from concretevitamin/digit-ex
      1398d9f [Zongheng Yang] Add linear_solver_mnist to examples/.
      d484c2a [Zongheng Yang] Add tests for actions on PipelinedRDD.
      d9cb95c [Zongheng Yang] Add setCheckpointDir() to context.R; comment fix.
      f8bc8a9 [Zongheng Yang] Minor edits per Shivaram's comments.
      8cd67f7 [Shivaram Venkataraman] Merge pull request #15 from shivaram/master
      d4468a9 [Shivaram Venkataraman] Remove trailing comma
      e2714b8 [Shivaram Venkataraman] Remove Apache Staging repo and update README
      334eace [Zongheng Yang] Add a multi-transformation test to benchmark on pipelining.
      5650ad7 [Zongheng Yang] Put serialized field inside env for both RDD and PipelinedRDD.
      0b9e8bb [Zongheng Yang] First cut at PipelinedRDD.
      a4c431e [Zongheng Yang] Add `isCheckpointed` field and checkpoint().
      dac0795 [Zongheng Yang] Minor inline comment style fix.
      bfb8e26 [Zongheng Yang] Add isCached field (inside an env) and unpersist().
      295bff6 [Zongheng Yang] Merge pull request #11 from shivaram/master
      4cb209c [Shivaram Venkataraman] Search rLibDir in worker before libPaths This ensures we pick up the SparkR intended and not an older version installed on the same machine
      ef198ff [Zongheng Yang] Merge pull request #10 from shivaram/unit-tests
      e0557a8 [Shivaram Venkataraman] Update travis to install plyr
      8b18bc1 [Shivaram Venkataraman] Merge branch 'master' of github.com:amplab/SparkR-pkg into unit-tests
      4a9ca31 [Shivaram Venkataraman] Use smaller broadcast and plyr instead of Matrix Matrix package takes around 2s to load and slows down unit tests.
      21c6a61 [Zongheng Yang] Merge pull request #8 from shivaram/master
      08c2947 [Shivaram Venkataraman] Move dev install directory to front of libPaths
      bda42ee [Shivaram Venkataraman] Merge pull request #7 from JoshRosen/travis
      cc5f5c0 [Josh Rosen] Add Travis CI integration (using craigcitro/r-travis)
      b6c864b [Shivaram Venkataraman] Merge pull request #6 from concretevitamin/env-style-fix
      4fcef22 [Zongheng Yang] Use one style ($) for accessing names in environments.
      8a948c6 [Shivaram Venkataraman] Merge pull request #4 from shivaram/master
      24978eb [Shivaram Venkataraman] Update README to use install_github
      8899db4 [Shivaram Venkataraman] Update TODO.md
      91792de [Shivaram Venkataraman] Update Spark requirements
      f34f4bf [Shivaram Venkataraman] Check tests for failures and output error msg
      cd750d3 [Shivaram Venkataraman] Update run-tests to use new path
      1877b7c [Shivaram Venkataraman] Unset R_TESTS to make tests work with R CMD check Also silence Akka remoting logs and update Makefile to build on log4j changes
      e60e18a [Shivaram Venkataraman] Update README to remove Spark installation notes
      4450189 [Shivaram Venkataraman] Add Spark 0.9 dependency from Apache Staging Also clean up assembly jar from inst on make clean
      5eb2131 [Shivaram Venkataraman] Update repo path in README
      ec8210e [Shivaram Venkataraman] Remove broadcastId hack as it is public in Spark
      9f0e080 [Shivaram Venkataraman] Merge branch 'install-github'
      5c88fbd [Shivaram Venkataraman] Add helper script to run tests
      77450a1 [Shivaram Venkataraman] Remove dependency on Spark Logging
      6cb00d1 [Shivaram Venkataraman] Update README and add helper script install-dev.sh
      28346ca [Shivaram Venkataraman] Only normalize if SPARK_HOME is not empty
      0fd6571 [Shivaram Venkataraman] Normalize SPARK_HOME before passing it
      ff96d5c [Shivaram Venkataraman] Pass in SPARK_HOME and jar file path
      34c4dce [Shivaram Venkataraman] Move src into pkg and update Makefile This enables the package to be installed using install_github using devtools and automates the build procedure.
      b25afed [Shivaram Venkataraman] Change package name to edu.berkeley.cs.amplab
      c691464 [Shivaram Venkataraman] Add Apache 2.0 License file
      27a4a4b [Shivaram Venkataraman] Add notes on how to compile roxygen2 docs
      ca63844 [Shivaram Venkataraman] Add broadcast documentation Also generate documentation for sample, takeSample etc.
      e4dd976 [Shivaram Venkataraman] Update TODO.md
      e42d435 [Shivaram Venkataraman] Add support for broadcast variables
      6b638e7 [Shivaram Venkataraman] Add the assembly jar to SparkContext
      bf24e32 [Shivaram Venkataraman] Merge branch 'master' of github.com:amplab/SparkR-pkg
      43c05ce [Zongheng Yang] Fix a flaky/incorrect test for sampleRDD().
      c6a9dfc [Zongheng Yang] Initial port of the kmeans example.
      6885581 [Zongheng Yang] Implement element-level sampleRDD() and takeSample() with tests.
      d3a4987 [Zongheng Yang] Add a test for lapplyPartitionsWithIndex on pairwise RDD.
      c7899c1 [Zongheng Yang] Add lapplyPartitionsWithIndex, with a test and an alias function.
      a9a7436 [Shivaram Venkataraman] Add DFC example from Tselil, Benjamin and Jonah
      fbc5a95 [Zongheng Yang] Implement take() and takeSample().
      c4a3409 [Shivaram Venkataraman] Use RDD instead of RRDD
      dfad3f5 [Zongheng Yang] Add test_utils.R: a unit test for convertJListToRList().
      a45227d [Zongheng Yang] Update .gitignore.
      238fe6e [Zongheng Yang] Add a unit test for textFile().
      a88898b [Zongheng Yang] Rename test_rrd to test_rrdd
      10c8baa [Shivaram Venkataraman] Make SparkR work as a standalone package. Changes include: 1. Adding a new `sbt` project that builds RRDD.scala 2. Change the onLoad functions to load the assembly jar for SparkR 3. Set rLibDir in RRDD.scala and worker.R to load things correctly
      78adcd8 [Shivaram Venkataraman] Add a gitignore
      ca6108f [Shivaram Venkataraman] Merge branch 'SparkR-scalacode' of ../SparkR
      999bd61 [Shivaram Venkataraman] Update collectPartition in R and use ClassTag
      c58f63e [Shivaram Venkataraman] Update collectPartition in R and use ClassTag
      48265fd [Shivaram Venkataraman] Use new version of collectPartitions in take
      d4fe086 [Shivaram Venkataraman] Move collectPartitions to JavaRDDLike Also remove numPartitions in JavaRDD and update R code
      bfecd7b [Shivaram Venkataraman] Scala 2.10 changes 1. Update sparkR script 2. Use classTag instead of classManifest
      092a4b3 [Shivaram Venkataraman] Add combineByKey, update TODO
      ac0d81d [Shivaram Venkataraman] Add more documentation
      d1dc3fa [Shivaram Venkataraman] Add more documentation
      c515e3a [Shivaram Venkataraman] Update TODO
      db56a34 [Shivaram Venkataraman] Add a test case for include package
      41cea51 [Shivaram Venkataraman] Ensure all parent environments are serialized. Also add a test case with an inline function
      a978e84 [Shivaram Venkataraman] Add support to include packages in the worker
      12bf8ce [Shivaram Venkataraman] Add support to include packages in the worker
      fb7e72c [Shivaram Venkataraman] Cleanup TODO
      16ac314 [Shivaram Venkataraman] Add documentation for functions in context, sparkR
      85b1d25 [Shivaram Venkataraman] Set license to Apache
      88f1101 [Shivaram Venkataraman] Add unit test running instructions
      c40768e [Shivaram Venkataraman] Update TODO
      0c7efbf [Shivaram Venkataraman] Refactor RRDD.scala and add comments to functions
      5880d42 [Shivaram Venkataraman] Refactor RRDD.scala and add comments to functions
      2dee36c [Shivaram Venkataraman] Remove empty test file
      a82219b [Shivaram Venkataraman] Update TODOs
      5db00dc [Shivaram Venkataraman] Add reduceByKey, groupByKey and refactor shuffle Other changes include 1. Adding unit tests for basic RDD functions and shuffle 2. Add a word count example 3. Change the dependency serialization to handle double loading of SparkR    package 4. Allow partitionBy to operate on any RDDs to create pair-wise RDD.
      f196479 [Shivaram Venkataraman] Add reduceByKey, groupByKey and refactor shuffle Other changes include 1. Adding unit tests for basic RDD functions and shuffle 2. Add a word count example 3. Change the dependency serialization to handle double loading of SparkR    package 4. Allow partitionBy to operate on any RDDs to create pair-wise RDD.
      987e36f [Shivaram Venkataraman] Add perf todo
      0b03265 [Shivaram Venkataraman] Update TODO with testing, docs todo
      685aaad [Zongheng Yang] First cut at refactoring worker.R. Remove pairwiseWorker.R.
      95b9ddc [Zongheng Yang] First cut at refactoring worker.R. Remove pairwiseWorker.R.
      4f00895 [Zongheng Yang] Remove the unnecessary `pairwise' flag in RRDD class. Reasons:
      75d36d9 [Zongheng Yang] Working versions: partitionBy() and collectPartition() for RRDD.
      e3fbd9d [Zongheng Yang] Working versions: partitionBy() and collectPartition() for RRDD.
      67a4335 [Zongheng Yang] Add unit test for parallelize() and collect() pairwise data.
      100ae65 [Zongheng Yang] Properly parallelize() and collect() pairwise data.
      cd0a5e2 [Zongheng Yang] Properly parallelize() and collect() pairwise data.
      aea16c3 [Zongheng Yang] WIP: second cut at partitionBy. Running into R/Scala communication issues.
      45eb943 [Zongheng Yang] WIP: second cut at partitionBy. Running into R/Scala communication issues.
      11c893b [Zongheng Yang] WIP: need to figure out the logic of (whether or not) shipping a hash func
      82c201a [Zongheng Yang] WIP: need to figure out the logic of (whether or not) shipping a hash func
      b3bfad2 [Zongheng Yang] Update TODO: take() done.
      0e45293 [Zongheng Yang] Add ability to parallelize key-val collections in R.
      f60406a [Zongheng Yang] Add ability to parallelize key-val collections in R.
      7d7fe3b [Zongheng Yang] Re-implement take(): take a partition at a time and append.
      a054e55 [Zongheng Yang] Fix take() tests(): mode difference.
      9de0935 [Zongheng Yang] Implement take() for RRDD.
      1e4427e [Zongheng Yang] Implement take() for RRDD.
      ec3cd67 [Shivaram Venkataraman] Use temp file in Spark to pipe output
      417aaed [Shivaram Venkataraman] Use temp file in Spark to pipe output
      bb0a3c3 [Shivaram Venkataraman] Add conf directory to classpath
      9594d8a [Shivaram Venkataraman] Clean up LR example
      3b26b58 [Shivaram Venkataraman] Add a list of things to do.
      cabce68 [Shivaram Venkataraman] Fix warnings from package check
      fde3f9c [Shivaram Venkataraman] Flatten by default and disable recursive unlist
      ab2e061 [Shivaram Venkataraman] Create LIB_DIR before installing SparkR package
      555220a [Shivaram Venkataraman] Add readme and update Makefile
      1319cda [Shivaram Venkataraman] Make standalone programs run with sparkR
      ae19fa8 [Shivaram Venkataraman] Add support for cache and use `tempfile`
      4e89ca4 [Shivaram Venkataraman] Add support for apply, reduce, count Also serialize closures using `save` and add two examples
      25a0bea [Shivaram Venkataraman] Add support for apply, reduce, count Also serialize closures using `save` and add two examples
      f50223f [Zongheng Yang] Make parallelize() and collect() use lists. Add a few more tests for them.
      fc7693f [Zongheng Yang] Refactor and enhance the previously added unit test a little bit.
      6de9b81 [Zongheng Yang] Add a simple unit test for parallelize().
      8b95155 [Zongheng Yang] Add testthat skeleton infrastructure
      ef305bf [Zongheng Yang] parallelize() followed by collect() now work for vectors/lists of strings and numerics (should work for other primitives as well).
      dc16af4 [Zongheng Yang] Comment: toArray() allocates memory for a copy
      f50121e [Zongheng Yang] Make parallelize() return JavaRDD[Array[Byte]]. Add RRDD.scala with a helper function in the singleton object.
      46eb063 [Zongheng Yang] Make parallelize() return JavaRDD[Array[Byte]]. Add RRDD.scala with a helper function in the singleton object.
      6b4938a [Zongheng Yang] parallelize(): a raw can be parallelized by JavaSparkContext and get back JavaRDD
      978aa0f [Zongheng Yang] Add parallelize() skeleton: only return serialized slices now
      84c1fd2 [Zongheng Yang] Use .jsimplify() to get around generic List's get() type erasure problem
      f16b891 [Zongheng Yang] Convert a few reflectionc alls to .jcall
      1284c13 [Zongheng Yang] WIP on collect(): JavaListToRList() failed with errors.
      4c2e516 [Zongheng Yang] Add simple prototype of S4 class RRDD. Make TextFile() returns an RRDD.
      82aa17a [Zongheng Yang] Add textFile()
      83ce63f [Zongheng Yang] Create a JavaSparkContext and save it in .sparkEnv using sparkR.init()
      01cdf0e [Zongheng Yang] Add Makefile for SparkR
      fc9cae2 [Shivaram Venkataraman] Add skeleton R package
      2fe0a1aa
  14. Mar 24, 2015
    • Brennon York's avatar
      [SPARK-6477][Build]: Run MIMA tests before the Spark test suite · 37fac1dc
      Brennon York authored
      This moves the MIMA checks to before the full Spark test suite such that, if new PR's fail the MIMA check, they will return much faster having not run the entire test suite. This is preferable to the current scenario where a user would have to wait until the entire test suite completes before realizing it failed on a MIMA check in which case, once the MIMA issues are fixed, the user would have to resubmit and rerun the full test suite again.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5145 from brennonyork/SPARK-6477 and squashes the following commits:
      
      12b0aee [Brennon York] updated to put the mima checks before the spark test suite
      37fac1dc
  15. Mar 04, 2015
    • Brennon York's avatar
      [SPARK-3355][Core]: Allow running maven tests in run-tests · 418f38d9
      Brennon York authored
      Added an AMPLAB_JENKINS_BUILD_TOOL env. variable to allow differentiation between maven and sbt build / test suites. The only issue I found with this is that, when running maven builds I wasn't able to get individual package tests running without running a `mvn install` first. Not sure what Jenkins is doing wrt its env., but figured its much better to just test everything than install packages in the "~/.m2/" directory and only test individual items, esp. if this is predominantly for the Jenkins build. Thoughts / comments would be great!
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #4734 from brennonyork/SPARK-3355 and squashes the following commits:
      
      c813d32 [Brennon York] changed mvn call from 'clean compile
      616ce30 [Brennon York] fixed merge conflicts
      3540de9 [Brennon York] added an AMPLAB_JENKINS_BUILD_TOOL env. variable to allow differentiation between maven and sbt build / test suites
      418f38d9
  16. Feb 09, 2015
  17. Jan 10, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5032] [graphx] Remove GraphX MIMA exclude for 1.3 · 33132609
      Joseph K. Bradley authored
      Since GraphX is no longer alpha as of 1.2, MimaExcludes should not exclude GraphX for 1.3
      
      Here are the individual excludes I had to add + the associated commits:
      
      ```
                  // SPARK-4444
                  ProblemFilters.exclude[IncompatibleResultTypeProblem](
                    "org.apache.spark.graphx.EdgeRDD.fromEdges"),
                  ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.graphx.EdgeRDD.filter"),
                  ProblemFilters.exclude[IncompatibleResultTypeProblem](
                    "org.apache.spark.graphx.impl.EdgeRDDImpl.filter"),
      ```
      [https://github.com/apache/spark/commit/9ac2bb18ede2e9f73c255fa33445af89aaf8a000]
      
      ```
                  // SPARK-3623
                  ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.graphx.Graph.checkpoint")
      ```
      [https://github.com/apache/spark/commit/e895e0cbecbbec1b412ff21321e57826d2d0a982]
      
      ```
                  // SPARK-4620
                  ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.graphx.Graph.unpersist"),
      ```
      [https://github.com/apache/spark/commit/8817fc7fe8785d7b11138ca744f22f7e70f1f0a0]
      
      CC: rxin
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #3856 from jkbradley/graphx-mima and squashes the following commits:
      
      1eea2f6 [Joseph K. Bradley] moved cleanup to run-tests
      527ccd9 [Joseph K. Bradley] fixed jenkins script to remove ivy2 cache
      802e252 [Joseph K. Bradley] Removed GraphX MIMA excludes and added line to clear spark from .m2 dir before Jenkins tests.  This may not work yet...
      30f8bb4 [Joseph K. Bradley] added individual mima excludes for graphx
      a3fea42 [Joseph K. Bradley] removed graphx mima exclude for 1.3
      33132609
  18. Dec 27, 2014
    • Brennon York's avatar
      [SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac · a3e51cc9
      Brennon York authored
      Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:
      
      0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
      9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
      d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
      b979c58 [Brennon York] updated the merge conflict
      c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
      b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
      be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
      28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
      7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
      14a5da0 [Brennon York] removed unnecessary zinc output on startup
      1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
      3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
      a680d12 [Brennon York] Added comments to functions and tested various mvn calls
      bb8cc9d [Brennon York] removed package files
      ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
      07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
      f914dea [Brennon York] Beginning final portions of localized scala home
      69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
      4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
      cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability
      a3e51cc9
  19. Dec 23, 2014
    • Cheng Lian's avatar
      [SPARK-4914][Build] Cleans lib_managed before compiling with Hive 0.13.1 · 395b771f
      Cheng Lian authored
      This PR tries to fix the Hive tests failure encountered in PR #3157 by cleaning `lib_managed` before building assembly jar against Hive 0.13.1 in `dev/run-tests`. Otherwise two sets of datanucleus jars would be left in `lib_managed` and may mess up class paths while executing Hive test suites. Please refer to [this thread] [1] for details. A clean build would be even safer, but we only clean `lib_managed` here to save build time.
      
      This PR also takes the chance to clean up some minor typos and formatting issues in the comments.
      
      [1]: https://github.com/apache/spark/pull/3157#issuecomment-67656488
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3756)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #3756 from liancheng/clean-lib-managed and squashes the following commits:
      
      e2bd21d [Cheng Lian] Adds lib_managed to clean set
      c9f2f3e [Cheng Lian] Cleans lib_managed before compiling with Hive 0.13.1
      395b771f
  20. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  21. Nov 04, 2014
    • Xiangrui Meng's avatar
      [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD · 1a9c6cdd
      Xiangrui Meng authored
      Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley.
      
      ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~
      
      marmbrus jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #3070 from mengxr/SPARK-3573 and squashes the following commits:
      
      3a0b6e5 [Xiangrui Meng] organize imports
      236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples
      1a9c6cdd
  22. Oct 31, 2014
    • wangfei's avatar
      [SPARK-3826][SQL]enable hive-thriftserver to support hive-0.13.1 · 7c41d135
      wangfei authored
       In #2241 hive-thriftserver is not enabled. This patch enable hive-thriftserver to support hive-0.13.1 by using a shim layer refer to #2241.
      
       1 A light shim layer(code in sql/hive-thriftserver/hive-version) for each different hive version to handle api compatibility
      
       2 New pom profiles "hive-default" and "hive-versions"(copy from #2241) to activate different hive version
      
       3 SBT cmd for different version as follows:
         hive-0.12.0 --- sbt/sbt -Phive,hadoop-2.3 -Phive-0.12.0 assembly
         hive-0.13.1 --- sbt/sbt -Phive,hadoop-2.3 -Phive-0.13.1 assembly
      
       4 Since hive-thriftserver depend on hive subproject, this patch should be merged with #2241 to enable hive-0.13.1 for hive-thriftserver
      
      Author: wangfei <wangfei1@huawei.com>
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2685 from scwf/shim-thriftserver1 and squashes the following commits:
      
      f26f3be [wangfei] remove clean to save time
      f5cac74 [wangfei] remove local hivecontext test
      578234d [wangfei] use new shaded hive
      18fb1ff [wangfei] exclude kryo in hive pom
      fa21d09 [wangfei] clean package assembly/assembly
      8a4daf2 [wangfei] minor fix
      0d7f6cf [wangfei] address comments
      f7c93ae [wangfei] adding build with hive 0.13 before running tests
      bcf943f [wangfei] Merge branch 'master' of https://github.com/apache/spark into shim-thriftserver1
      c359822 [wangfei] reuse getCommandProcessor in hiveshim
      52674a4 [scwf] sql/hive included since examples depend on it
      3529e98 [scwf] move hive module to hive profile
      f51ff4e [wangfei] update and fix conflicts
      f48d3a5 [scwf] Merge branch 'master' of https://github.com/apache/spark into shim-thriftserver1
      41f727b [scwf] revert pom changes
      13afde0 [scwf] fix small bug
      4b681f4 [scwf] enable thriftserver in profile hive-0.13.1
      0bc53aa [scwf] fixed when result filed is null
      dfd1c63 [scwf] update run-tests to run hive-0.12.0 default now
      c6da3ce [scwf] Merge branch 'master' of https://github.com/apache/spark into shim-thriftserver
      7c66b8e [scwf] update pom according spark-2706
      ae47489 [scwf] update and fix conflicts
      7c41d135
  23. Oct 26, 2014
    • Michael Armbrust's avatar
      [HOTFIX][SQL] Temporarily turn off hive-server tests. · 879a1658
      Michael Armbrust authored
      The thirift server is not available in the default (hive13) profile yet which is breaking all SQL only PRs.  This turns off these test until #2685 is merged.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2950 from marmbrus/fixTests and squashes the following commits:
      
      1a6dfee [Michael Armbrust] [HOTFIX][SQL] Temporarily turn of hive-server tests.
      879a1658
  24. Oct 24, 2014
    • Michael Armbrust's avatar
      [SQL] Update Hive test harness for Hive 12 and 13 · 3a845d3c
      Michael Armbrust authored
      As part of the upgrade I also copy the newest version of the query tests, and whitelist a bunch of new ones that are now passing.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2936 from marmbrus/fix13tests and squashes the following commits:
      
      d9cbdab [Michael Armbrust] Remove user specific tests
      65801cd [Michael Armbrust] style and rat
      8f6b09a [Michael Armbrust] Update test harness to work with both Hive 12 and 13.
      f044843 [Michael Armbrust] Update Hive query tests and golden files to 0.13
      3a845d3c
    • Zhan Zhang's avatar
      [SPARK-2706][SQL] Enable Spark to support Hive 0.13 · 7c89a8f0
      Zhan Zhang authored
      Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support.
      
      Approach: Introduce “hive-version” property,  and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically,
      
      1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1
      
      2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0)  if no hive.version is specified.
      
      3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1.
      
      4. With this approach, nothing is changed with current hive-0.12 support.
      
      No change by default: sbt/sbt -Phive
      For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
      
      To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1
      For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
      
      Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1).
      
      Author: Zhan Zhang <zhazhan@gmail.com>
      Author: zhzhan <zhazhan@gmail.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2241 from zhzhan/spark-2706 and squashes the following commits:
      
      3ece905 [Zhan Zhang] minor fix
      410b668 [Zhan Zhang] solve review comments
      cbb4691 [Zhan Zhang] change run-test for new options
      0d4d2ed [Zhan Zhang] rebase
      497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default
      ab028d1 [Zhan Zhang] rebase
      4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241
      b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706
      2b50502 [Zhan Zhang] rebase
      a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      cb22863 [Zhan Zhang] correct the typo
      20f6cf7 [Zhan Zhang] solve compatability issue
      f7912a9 [Zhan Zhang] rebase and solve review feedback
      301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      10c3565 [Zhan Zhang] address review comments
      6bc9204 [Zhan Zhang] rebase and remove temparory repo
      d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706
      cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      3ced0d7 [Zhan Zhang] rebase
      d9b981d [Zhan Zhang] rebase and fix error due to rollback
      adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts
      d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      dc7bdb3 [Zhan Zhang] solve conflicts
      7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706
      68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      d48bd18 [Zhan Zhang] address review comments
      3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706
      2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      9412d24 [Zhan Zhang] address review comments
      f4af934 [Zhan Zhang] rebase
      1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being
      af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      5f5619f [Zhan Zhang] restructure the directory and different hive version support
      05d3683 [Zhan Zhang] solve conflicts
      e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark
      87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706
      921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706
      789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      f6a8a40 [Zhan Zhang] revert
      ba14f28 [Zhan Zhang] test
      dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master'
      70964fe [Zhan Zhang] revert
      fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
      70ffd93 [Zhan Zhang] revert
      42585ec [Zhan Zhang] test
      7d5fce2 [Zhan Zhang] test
      7c89a8f0
  25. Oct 08, 2014
  26. Oct 06, 2014
    • Nicholas Chammas's avatar
      [SPARK-3479] [Build] Report failed test category · 69c3f441
      Nicholas Chammas authored
      This PR allows SparkQA (i.e. Jenkins) to report in its posts to GitHub what category of test failed, if one can be determined.
      
      The failure categories are:
      * general failure
      * RAT checks failed
      * Scala style checks failed
      * Python style checks failed
      * Build failed
      * Spark unit tests failed
      * PySpark unit tests failed
      * MiMa checks failed
      
      This PR also fixes the diffing logic used to determine if a patch introduces new classes.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2606 from nchammas/report-failed-test-category and squashes the following commits:
      
      d67df03 [Nicholas Chammas] report what test category failed
      69c3f441
  27. Sep 19, 2014
  28. Sep 18, 2014
  29. Sep 17, 2014
    • Nicholas Chammas's avatar
      [SPARK-3534] Fix expansion of testing arguments to sbt · 7fc3bb7c
      Nicholas Chammas authored
      Testing arguments to `sbt` need to be passed as an array, not a single, long string.
      
      Fixes a bug introduced in #2420.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2437 from nchammas/selective-testing and squashes the following commits:
      
      a9f9c1c [Nicholas Chammas] fix printing of sbt test arguments
      cf57cbf [Nicholas Chammas] fix sbt test arguments
      e33b978 [Nicholas Chammas] Merge pull request #2 from apache/master
      0b47ca4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark
      8051486 [Nicholas Chammas] Merge pull request #1 from apache/master
      03180a4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark
      d4c5f43 [Nicholas Chammas] Merge pull request #6 from apache/master
      7fc3bb7c
    • Nicholas Chammas's avatar
      [SPARK-1455] [SPARK-3534] [Build] When possible, run SQL tests only. · 5044e495
      Nicholas Chammas authored
      If the only files changed are related to SQL, then only run the SQL tests.
      
      This patch includes some cosmetic/maintainability refactoring. I would be more than happy to undo some of these changes if they are inappropriate.
      
      We can accept this patch mostly as-is and address the immediate need documented in [SPARK-3534](https://issues.apache.org/jira/browse/SPARK-3534), or we can keep it open until a satisfactory solution along the lines [discussed here](https://issues.apache.org/jira/browse/SPARK-1455?focusedCommentId=14136424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14136424) is reached.
      
      Note: I had to hack this patch up to test it locally, so what I'm submitting here and what I tested are technically different.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2420 from nchammas/selective-testing and squashes the following commits:
      
      db3fa2d [Nicholas Chammas] diff against master!
      f9e23f6 [Nicholas Chammas] when possible, run SQL tests only
      5044e495
  30. Sep 09, 2014
  31. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  32. Sep 04, 2014
  33. Aug 20, 2014
    • Cheng Lian's avatar
      [SPARK-3126][SPARK-3127][SQL] Fixed HiveThriftServer2Suite · cf46e725
      Cheng Lian authored
      This PR fixes two issues:
      
      1. Fixes wrongly quoted command line option in `HiveThriftServer2Suite` that makes test cases hang until timeout.
      1. Asks `dev/run-test` to run Spark SQL tests when `bin/spark-sql` and/or `sbin/start-thriftserver.sh` are modified.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2036 from liancheng/fix-thriftserver-test and squashes the following commits:
      
      f38c4eb [Cheng Lian] Fixed the same quotation issue in CliSuite
      26b82a0 [Cheng Lian] Run SQL tests when dff contains bin/spark-sql and/or sbin/start-thriftserver.sh
      a87f83d [Cheng Lian] Extended timeout
      e5aa31a [Cheng Lian] Fixed metastore JDBC URI quotation
      cf46e725
    • Patrick Wendell's avatar
      SPARK-3092 [SQL]: Always include the thriftserver when -Phive is enabled. · f2f26c2a
      Patrick Wendell authored
      Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things.
      
      This has been suggested in the past by liancheng.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2006 from pwendell/hiveserver and squashes the following commits:
      
      742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver
      034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
      f2f26c2a
  34. Aug 18, 2014
    • Josh Rosen's avatar
      [SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL. · 1f1819b2
      Josh Rosen authored
      This fixes SPARK-3114, an issue where we inadvertently broke Python UDFs in Spark SQL.
      
      This PR modifiers the test runner script to always run the PySpark SQL tests, irrespective of whether SparkSQL itself has been modified.  It also includes Davies' fix for the bug.
      
      Closes #2026.
      
      Author: Josh Rosen <joshrosen@apache.org>
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2027 from JoshRosen/pyspark-sql-fix and squashes the following commits:
      
      9af2708 [Davies Liu] bugfix: disable compression of command
      0d8d3a4 [Josh Rosen] Always run Python Spark SQL tests.
      1f1819b2
  35. Aug 06, 2014
    • Nicholas Chammas's avatar
      [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically · d614967b
      Nicholas Chammas authored
      As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.
      
      Notes:
      * We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
      * I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
      * I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
      * Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1744 from nchammas/master and squashes the following commits:
      
      274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
      983d963 [nchammas] Merge pull request #5 from apache/master
      1db5314 [nchammas] Merge pull request #4 from apache/master
      0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
      bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
      6db9a44 [nchammas] Merge pull request #3 from apache/master
      7b4750e [Nicholas Chammas] merge upstream changes
      91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
      44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
      b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
      bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
      9da347f [nchammas] Merge pull request #2 from apache/master
      aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
      d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
      dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
      a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
      21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
      6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
      fe57ed0 [Nicholas Chammas] removing merge conflict backups
      9c01d4c [nchammas] Merge pull request #1 from apache/master
      9a66cb0 [Nicholas Chammas] resolving merge conflicts
      a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
      beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
      723ed39 [Nicholas Chammas] always delete the report file
      0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
      12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
      61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
      75ad552 [Nicholas Chammas] make check output style consistent
      d614967b
  36. Aug 02, 2014
    • Chris Fregly's avatar
      [SPARK-1981] Add AWS Kinesis streaming support · 91f9504e
      Chris Fregly authored
      Author: Chris Fregly <chris@fregly.com>
      
      Closes #1434 from cfregly/master and squashes the following commits:
      
      4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
      0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
      691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
      0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
      e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
      d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
      912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
      db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
      338997e [Chris Fregly] improve build docs for kinesis
      828f8ae [Chris Fregly] more cleanup
      e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      cd68c0d [Chris Fregly] fixed typos and backward compatibility
      d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
      91f9504e
  37. Jul 30, 2014
Loading