Skip to content
Snippets Groups Projects
  1. Mar 08, 2015
  2. Mar 05, 2015
  3. Mar 04, 2015
  4. Mar 03, 2015
  5. Feb 16, 2015
  6. Feb 14, 2015
    • gli's avatar
      SPARK-5822 [BUILD] cannot import src/main/scala & src/test/scala into eclipse as source folder · ed5f4bb7
      gli authored
         When import the whole project into eclipse as maven project, found that the
         src/main/scala & src/test/scala can not be set as source folder as default
         behavior, so add a "add-source" goal in scala-maven-plugin to let this work.
      
      Author: gli <gli@redhat.com>
      
      Closes #4531 from ligangty/addsource and squashes the following commits:
      
      4e4db4c [gli] [IDE] cannot import src/main/scala & src/test/scala into eclipse as source folder
      ed5f4bb7
  7. Feb 13, 2015
    • Josh Rosen's avatar
      [SPARK-5735] Replace uses of EasyMock with Mockito · 077eec2d
      Josh Rosen authored
      This patch replaces all uses of EasyMock with Mockito.  There are two motivations for this:
      
      1. We should use a single mocking framework in our tests in order to keep things consistent.
      2. EasyMock may be responsible for non-deterministic unit test failures due to its Objensis dependency (see SPARK-5626 for more details).
      
      Most of these changes are fairly mechanical translations of Mockito code to EasyMock, although I made a small change that strengthens the assertions in one test in KinesisReceiverSuite.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4578 from JoshRosen/SPARK-5735-remove-easymock and squashes the following commits:
      
      0ab192b [Josh Rosen] Import sorting plus two minor changes to more closely match old semantics.
      977565b [Josh Rosen] Remove EasyMock from build.
      fae1d8f [Josh Rosen] Remove EasyMock usage in KinesisReceiverSuite.
      7cca486 [Josh Rosen] Remove EasyMock usage in MesosSchedulerBackendSuite
      fc5e94d [Josh Rosen] Remove EasyMock in CacheManagerSuite
      077eec2d
  8. Feb 10, 2015
  9. Feb 09, 2015
    • Marcelo Vanzin's avatar
      [SPARK-2996] Implement userClassPathFirst for driver, yarn. · 20a60131
      Marcelo Vanzin authored
      Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as
      `spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it
      modifies the system classpath, instead of restricting the changes to the user's class
      loader. So this change implements the behavior of the latter for Yarn, and deprecates
      the more dangerous choice.
      
      To be able to achieve feature-parity, I also implemented the option for drivers (the existing
      option only applies to executors). So now there are two options, each controlling whether
      to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
      aliased to the new one (`spark.executor.userClassPathFirst`).
      
      The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
      was also doing some things that ended up causing JVM errors depending on how things
      were being called.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3233 from vanzin/SPARK-2996 and squashes the following commits:
      
      9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation.
      fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically.
      a8c69f1 [Marcelo Vanzin] Review feedback.
      cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test.
      0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful.
      fe970a7 [Marcelo Vanzin] Review feedback.
      25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent.
      fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks.
      2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation.
      b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      a10f379 [Marcelo Vanzin] Some feedback.
      3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      7b57cba [Marcelo Vanzin] Remove now outdated message.
      5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
      d1273b2 [Marcelo Vanzin] Add test file to rat exclude.
      fa1aafa [Marcelo Vanzin] Remove write check on user jars.
      89d8072 [Marcelo Vanzin] Cleanups.
      a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode.
      50afa5f [Marcelo Vanzin] Fix Yarn executor command line.
      7d14397 [Marcelo Vanzin] Register user jars in executor up front.
      7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst.
      20373f5 [Marcelo Vanzin] Fix ClientBaseSuite.
      55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit.
      0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option.
      4a84d87 [Marcelo Vanzin] Fix the child-first class loader.
      d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf.
      46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst".
      a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit.
      91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation.
      a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments.
      89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode.
      20a60131
  10. Feb 08, 2015
    • medale's avatar
      [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai... · 75fdccca
      medale authored
      ...ns avro-mapred for
      
      hadoop 1 API had been marked as resolved but did not work for at least some
      builds due to version conflicts using avro-mapred-1.7.5.jar and
      avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.
      
      sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:
      
      Building Spark Project Hive 1.2.0
      [INFO] ------------------------------------------------------------------------
      [INFO]
      [INFO] --- maven-dependency-plugin:2.4:tree (default-cli)  spark-hive_2.10 ---
      [INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
      [INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
      [INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
      [INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
      [INFO]
      
      Excluding this dependency allows the explicitly listed avro-mapred dependency
      to be picked up.
      
      Author: medale <medale94@yahoo.com>
      
      Closes #4315 from medale/avro-hadoop2 and squashes the following commits:
      
      1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2
      9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into avro-hadoop2
      51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.
      75fdccca
  11. Feb 07, 2015
    • Josh Rosen's avatar
      [SPARK-5671] Upgrade jets3t to 0.9.2 in hadoop-2.3 and 2.4 profiles · 5de14cc2
      Josh Rosen authored
      Upgrading from jets3t 0.9.0 to 0.9.2 fixes a dependency issue that was
      causing UISeleniumSuite to fail with ClassNotFoundExceptions when run
      the hadoop-2.3 or hadoop-2.4 profiles.
      
      The jets3t release notes can be found at http://www.jets3t.org/RELEASE_NOTES.html
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4454 from JoshRosen/SPARK-5671 and squashes the following commits:
      
      fa6cb3e [Josh Rosen] [SPARK-5671] Upgrade jets3t to 0.9.2 in hadoop-2.3 and 2.4 profiles
      5de14cc2
    • Zhan Zhang's avatar
      [SPARK-5108][BUILD] Jackson dependency management for Hadoop-2.6.0 support · ecbbed2e
      Zhan Zhang authored
      There is dependency compatibility issue. Currently hadoop-2.6.0 use 1.9.13 for jackson. Upgrade to the same version to make it consistent.
      
      Author: Zhan Zhang <zhazhan@gmail.com>
      
      Closes #3938 from zhzhan/spark5108 and squashes the following commits:
      
      0080a84 [Zhan Zhang] change to upgrade jackson version only in hadoop-2.x
      0b9bad6 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark into spark5108
      917600a [Zhan Zhang] solve conflicts
      f7064d0 [Zhan Zhang] hadoop2.6 dependency management fix
      fc56b25 [Zhan Zhang] squash all commits
      3bf966c [Zhan Zhang] test
      ecbbed2e
  12. Feb 06, 2015
    • Andrew Or's avatar
      [SPARK-5388] Provide a stable application submission gateway for standalone cluster mode · 1390e56f
      Andrew Or authored
      The goal is to provide a stable, REST-based application submission gateway that is not inherently based on Akka, which is unstable across versions. This PR targets standalone cluster mode, but is implemented in a general enough manner that can be potentially extended to other modes in the future. Client mode is currently not included in the changes here because there are many more Akka messages exchanged there.
      
      As of the changes here, the Master will advertise two ports, 7077 and 6066. We need to keep around the old one (7077) for client mode and older versions of Spark submit. However, all new versions of Spark submit will use the REST gateway (6066).
      
      By the way this includes ~700 lines of tests and ~200 lines of license.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4216 from andrewor14/rest and squashes the following commits:
      
      8d7ce07 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      6f0c597 [Andrew Or] Use nullable fields for integer and boolean values
      dfe4bd7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      b9e2a08 [Andrew Or] Minor comments
      02b5cea [Andrew Or] Fix tests
      d2b1ef8 [Andrew Or] Comment changes + minor code refactoring across the board
      9c82a36 [Andrew Or] Minor comment and wording updates
      b4695e7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      c9a8ad7 [Andrew Or] Do not include appResource and mainClass as properties
      6fc7670 [Andrew Or] Report REST server response back to the user
      40e6095 [Andrew Or] Pass submit parameters through system properties
      cbd670b [Andrew Or] Include unknown fields, if any, in server response
      9fee16f [Andrew Or] Include server protocol version on mismatch
      09f873a [Andrew Or] Fix style
      8188e61 [Andrew Or] Upgrade Jackson from 2.3.0 to 2.4.4
      37538e0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      9165ae8 [Andrew Or] Fall back to Akka if endpoint was not REST
      252d53c [Andrew Or] Clean up server error handling behavior further
      c643f64 [Andrew Or] Fix style
      bbbd329 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      792e112 [Andrew Or] Use specific HTTP response codes on error
      f98660b [Andrew Or] Version the protocol and include it in REST URL
      721819f [Andrew Or] Provide more REST-like interface for submit/kill/status
      581f7bf [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      9e0d1af [Andrew Or] Move some classes around to reduce number of files (minor)
      42e5de4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      1f1c03f [Andrew Or] Use Jackson's DefaultScalaModule to simplify messages
      9229433 [Andrew Or] Reduce duplicate naming in REST field
      ade28fd [Andrew Or] Clean up REST response output in Spark submit
      b2fef8b [Andrew Or] Abstract the success field to the general response
      6c57b4b [Andrew Or] Increase timeout in end-to-end tests
      bf696ff [Andrew Or] Add checks for enabling REST when using kill/status
      7ee6737 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      e2f7f5f [Andrew Or] Provide more safeguard against missing fields
      9581df7 [Andrew Or] Clean up uses of exceptions
      914fdff [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      e2104e6 [Andrew Or] stable -> rest
      3db7379 [Andrew Or] Fix comments and name fields for better error messages
      8d43486 [Andrew Or] Replace SubmitRestProtocolAction with class name
      df90e8b [Andrew Or] Use Jackson for JSON de/serialization
      d7a1f9f [Andrew Or] Fix local cluster tests
      efa5e18 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      e42c131 [Andrew Or] Add end-to-end tests for standalone REST protocol
      837475b [Andrew Or] Show the REST port on the Master UI
      d8d3717 [Andrew Or] Use a daemon thread pool for REST server
      6568ca5 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest
      77774ba [Andrew Or] Minor fixes
      206cae4 [Andrew Or] Refactor and add tests for the REST protocol
      63c05b3 [Andrew Or] Remove MASTER as a field (minor)
      9e21b72 [Andrew Or] Action -> SparkSubmitAction (minor)
      51c5ca6 [Andrew Or] Distinguish client and server side Spark versions
      b44e103 [Andrew Or] Implement status requests + fix validation behavior
      120ab9d [Andrew Or] Support kill and request driver status through SparkSubmit
      544de1d [Andrew Or] Major clean ups in code and comments
      e958cae [Andrew Or] Supported nested values in messages
      484bd21 [Andrew Or] Specify an ordering for fields in SubmitDriverRequestMessage
      6ff088d [Andrew Or] Rename classes to generalize REST protocol
      af9d9cb [Andrew Or] Integrate REST protocol in standalone mode
      53e7c0e [Andrew Or] Initial client, server, and all the messages
      1390e56f
  13. Feb 05, 2015
    • Patrick Wendell's avatar
      Revert "SPARK-5607: Update to Kryo 2.24.0 to avoid including objenesis 1.2." · 6d3b7cbe
      Patrick Wendell authored
      This reverts commit c3b8d272cf0574e72422d8d7f4f0683dcbdce41b.
      6d3b7cbe
    • Patrick Wendell's avatar
      SPARK-5557: Explicitly include servlet API in dependencies. · 793dbaef
      Patrick Wendell authored
      Because of the way we shade jetty, we lose its dependency orbit
      in the assembly jar, which includes the javax servlet API's. This
      adds back orbit explicitly, using the version that matches
      our jetty version.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4411 from pwendell/servlet-api and squashes the following commits:
      
      445f868 [Patrick Wendell] SPARK-5557: Explicitly include servlet API in dependencies.
      793dbaef
    • Patrick Wendell's avatar
      SPARK-5607: Update to Kryo 2.24.0 to avoid including objenesis 1.2. · c23ac03c
      Patrick Wendell authored
      Our existing Kryo version actually embeds objenesis 1.2 classes in
      its jar, causing dependency conflicts during tests. This updates us to
      Kryo 2.24.0 (which was changed to not embed objenesis) to avoid this
      behavior. See the JIRA for more detail.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4383 from pwendell/SPARK-5607 and squashes the following commits:
      
      c3b8d27 [Patrick Wendell] SPARK-5607: Update to Kryo 2.24.0 to avoid including objenesis 1.2.
      c23ac03c
  14. Feb 04, 2015
    • Burak Yavuz's avatar
      [SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit · 6aed719e
      Burak Yavuz authored
      This PR adds support for using maven coordinates as dependencies to spark-shell.
      Coordinates can be provided as a comma-delimited string after the flag `--packages`.
      Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag
      `--repositories`.
      
      Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem.
      
      pwendell, mateiz, mengxr
      
      **Note: This is still a WIP. The following need to be handled:**
      - [x] add docs for the methods
      - [x] take local ivy cache path as an argument
      - [x] add tests
      - [x] add Windows compatibility
      - [x] exclude unused Ivy dependencies
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #4215 from brkyvz/SPARK-5341ivy and squashes the following commits:
      
      9215851 [Burak Yavuz] ready to merge
      db2a5cc [Burak Yavuz] changed logging to printStream
      9dae87f [Burak Yavuz] file separators changed
      71c374d [Burak Yavuz] merge conflicts fixed
      c08dc9f [Burak Yavuz] fixed merge conflicts
      3ada19a [Burak Yavuz] fixed Jenkins error (hopefully) and added comment on oro
      43c2290 [Burak Yavuz] fixed that ONE line
      231f72f [Burak Yavuz] addressed code review
      2cd6562 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5341ivy
      85ec5a3 [Burak Yavuz] added oro as a dependency explicitly
      ea44ca4 [Burak Yavuz] add oro back to dependencies
      cef0e24 [Burak Yavuz] IntelliJ is just messing things up
      97c4a92 [Burak Yavuz] fix more weird IntelliJ formatting
      9cf077d [Burak Yavuz] fix weird IntelliJ formatting
      dcf5e13 [Burak Yavuz] fix windows command line flags
      3a23f21 [Burak Yavuz] excluded ivy dependencies
      53423e0 [Burak Yavuz] tests added
      3705907 [Burak Yavuz] remove ivy-repo as a command line argument. Use global ivy cache as default
      c04d885 [Burak Yavuz] take path to ivy cache as a conf
      2edc9b5 [Burak Yavuz] managed to exclude Spark and it's dependencies
      a0870af [Burak Yavuz] add docs. remove unnecesary new lines
      6645af4 [Burak Yavuz] [SPARK-5341] added base implementation
      882c4c8 [Burak Yavuz] added maven dependency download
      6aed719e
  15. Feb 03, 2015
    • Daoyuan Wang's avatar
      [SPARK-4987] [SQL] parquet timestamp type support · 0c20ce69
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3820 from adrian-wang/parquettimestamp and squashes the following commits:
      
      b1e2a0d [Daoyuan Wang] fix for nanos
      4dadef1 [Daoyuan Wang] fix wrong read
      93f438d [Daoyuan Wang] parquet timestamp support
      0c20ce69
  16. Feb 02, 2015
    • Patrick Wendell's avatar
      SPARK-3996: Add jetty servlet and continuations. · 7930d2be
      Patrick Wendell authored
      These are needed transitively from the other Jetty libraries
      we include. It was not picked up by unit tests because we
      disable the UI.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4323 from pwendell/jetty and squashes the following commits:
      
      d8669da [Patrick Wendell] SPARK-3996: Add jetty servlet and continuations.
      7930d2be
    • Davies Liu's avatar
      [SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python · 0561c454
      Davies Liu authored
      This PR brings the Python API for Spark Streaming Kafka data source.
      
      ```
          class KafkaUtils(__builtin__.object)
           |  Static methods defined here:
           |
           |  createStream(ssc, zkQuorum, groupId, topics, storageLevel=StorageLevel(True, True, False, False,
      2), keyDecoder=<function utf8_decoder>, valueDecoder=<function utf8_decoder>)
           |      Create an input stream that pulls messages from a Kafka Broker.
           |
           |      :param ssc:  StreamingContext object
           |      :param zkQuorum:  Zookeeper quorum (hostname:port,hostname:port,..).
           |      :param groupId:  The group id for this consumer.
           |      :param topics:  Dict of (topic_name -> numPartitions) to consume.
           |                      Each partition is consumed in its own thread.
           |      :param storageLevel:  RDD storage level.
           |      :param keyDecoder:  A function used to decode key
           |      :param valueDecoder:  A function used to decode value
           |      :return: A DStream object
      ```
      run the example:
      
      ```
      bin/spark-submit --driver-class-path external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test
      ```
      
      Author: Davies Liu <davies@databricks.com>
      Author: Tathagata Das <tdas@databricks.com>
      
      Closes #3715 from davies/kafka and squashes the following commits:
      
      d93bfe0 [Davies Liu] Update make-distribution.sh
      4280d04 [Davies Liu] address comments
      e6d0427 [Davies Liu] Merge branch 'master' of github.com:apache/spark into kafka
      f257071 [Davies Liu] add tests for null in RDD
      23b039a [Davies Liu] address comments
      9af51c4 [Davies Liu] Merge branch 'kafka' of github.com:davies/spark into kafka
      a74da87 [Davies Liu] address comments
      dc1eed0 [Davies Liu] Update kafka_wordcount.py
      31e2317 [Davies Liu] Update kafka_wordcount.py
      370ba61 [Davies Liu] Update kafka.py
      97386b3 [Davies Liu] address comment
      2c567a5 [Davies Liu] update logging and comment
      33730d1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into kafka
      adeeb38 [Davies Liu] Merge pull request #3 from tdas/kafka-python-api
      aea8953 [Tathagata Das] Kafka-assembly for Python API
      eea16a7 [Davies Liu] refactor
      f6ce899 [Davies Liu] add example and fix bugs
      98c8d17 [Davies Liu] fix python style
      5697a01 [Davies Liu] bypass decoder in scala
      048dbe6 [Davies Liu] fix python style
      75d485e [Davies Liu] add mqtt
      07923c4 [Davies Liu] support kafka in Python
      0561c454
  17. Feb 01, 2015
    • Patrick Wendell's avatar
      [SPARK-3996]: Shade Jetty in Spark deliverables · a15f6e31
      Patrick Wendell authored
      (v2 of this patch with a fix that was only relevant for the maven build).
      
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the <artifactSet>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4285 from pwendell/jetty and squashes the following commits:
      
      d3e7f4e [Patrick Wendell] Fix for shaded deps causing compile errors
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      a15f6e31
    • Ryan Williams's avatar
      [SPARK-5422] Add support for sending Graphite metrics via UDP · 80bd715a
      Ryan Williams authored
      Depends on [SPARK-5413](https://issues.apache.org/jira/browse/SPARK-5413) / #4209, included here, will rebase once the latter's merged.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4218 from ryan-williams/udp and squashes the following commits:
      
      ebae393 [Ryan Williams] Add support for sending Graphite metrics via UDP
      cb58262 [Ryan Williams] bump metrics dependency to v3.1.0
      80bd715a
  18. Jan 29, 2015
    • Patrick Wendell's avatar
      Revert "[WIP] [SPARK-3996]: Shade Jetty in Spark deliverables" · d2071e8f
      Patrick Wendell authored
      This reverts commit f240fe39.
      d2071e8f
    • Patrick Wendell's avatar
      [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables · f240fe39
      Patrick Wendell authored
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the \<artifactSet\>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4252 from pwendell/jetty and squashes the following commits:
      
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      f240fe39
  19. Jan 28, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4809] Rework Guava library shading. · 37a5e272
      Marcelo Vanzin authored
      The current way of shading Guava is a little problematic. Code that
      depends on "spark-core" does not see the transitive dependency, yet
      classes in "spark-core" actually depend on Guava. So it's a little
      tricky to run unit tests that use spark-core classes, since you need
      a compatible version of Guava in your dependencies when running the
      tests. This can become a little tricky, and is kind of a bad user
      experience.
      
      This change modifies the way Guava is shaded so that it's applied
      uniformly across the Spark build. This means Guava is shaded inside
      spark-core itself, so that the dependency issues above are solved.
      Aside from that, all Spark sub-modules have their Guava references
      relocated, so that they refer to the relocated classes now packaged
      inside spark-core. Before, this was only done by the time the assembly
      was built, so projects that did not end up inside the assembly (such
      as streaming backends) could still reference the original location
      of Guava classes.
      
      The Guava classes are added to the "first" artifact Spark generates
      (network-common), so that all downstream modules have the needed
      classes available. Since "network-common" is a dependency of spark-core,
      all Spark apps should get the relocated classes automatically.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3658 from vanzin/SPARK-4809 and squashes the following commits:
      
      3c93e42 [Marcelo Vanzin] Shade Guava in the network-common artifact.
      5d69ec9 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      b3104fc [Marcelo Vanzin] Add comment.
      941848f [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      f78c48a [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      8053dd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      107d7da [Marcelo Vanzin] Add fix for SPARK-5052 (PR #3874).
      40b8723 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      4a4ed42 [Marcelo Vanzin] [SPARK-4809] Rework Guava library shading.
      37a5e272
  20. Jan 25, 2015
  21. Jan 21, 2015
  22. Jan 12, 2015
    • Sean Owen's avatar
      SPARK-5172 [BUILD] spark-examples-***.jar shades a wrong Hadoop distribution · aff49a3e
      Sean Owen authored
      In addition to the `hadoop-2.x` profiles in the parent POM, there is actually another set of profiles in `examples` that has to be activated differently to get the right Hadoop 1 vs 2 flavor of HBase. This wasn't actually used in making Hadoop 2 distributions, hence the problem.
      
      To reduce complexity, I suggest merging them with the parent POM profiles, which is possible now.
      
      You'll see this changes appears to update the HBase version, but actually, the default 0.94 version was not being used. HBase is only used in examples, and the examples POM always chose one profile or the other that updated the version to 0.98.x anyway.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3992 from srowen/SPARK-5172 and squashes the following commits:
      
      17830d9 [Sean Owen] Control hbase hadoop1/2 flavor in the parent POM with existing hadoop-2.x profiles
      aff49a3e
    • Sean Owen's avatar
      SPARK-4159 [BUILD] Addendum: improve running of single test after enabling Java tests · 13e610b8
      Sean Owen authored
      https://issues.apache.org/jira/browse/SPARK-4159 was resolved but as Sandy points out, the guidance in https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools under "Running Individual Tests" no longer quite works, not optimally.
      
      This minor change is not really the important change, which is an update to the wiki text. The correct way to run one Scala test suite in Maven is now:
      
      ```
      mvn test -DwildcardSuites=org.apache.spark.io.CompressionCodecSuite -Dtests=none
      ```
      
      The correct way to run one Java test is
      
      ```
      mvn test -DwildcardSuites=none -Dtests=org.apache.spark.streaming.JavaAPISuite
      ```
      
      Basically, you have to set two properties in order to suppress all of one type of test (with a non-existent test name like 'none') and all but one test of the other type.
      
      The change in the PR just prevents Surefire from barfing when it finds no "none" test.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3993 from srowen/SPARK-4159 and squashes the following commits:
      
      83106d7 [Sean Owen] Default failIfNoTests to false to enable the -DwildcardSuites=... -Dtests=... syntax for running one test to work
      13e610b8
  23. Jan 09, 2015
    • Jongyoul Lee's avatar
      [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 · 454fe129
      Jongyoul Lee authored
      - update version from 0.18.1 to 0.21.0
      - I'm doing some tests in order to verify some spark jobs work fine on mesos 0.21.0 environment.
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3934 from jongyoul/SPARK-3619 and squashes the following commits:
      
      ab994fa [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - update version from 0.18.1 to 0.21.0
      454fe129
  24. Jan 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4048] Enhance and extend hadoop-provided profile. · 48cecf67
      Marcelo Vanzin authored
      This change does a few things to make the hadoop-provided profile more useful:
      
      - Create new profiles for other libraries / services that might be provided by the infrastructure
      - Simplify and fix the poms so that the profiles are only activated while building assemblies.
      - Fix tests so that they're able to run when the profiles are activated
      - Add a new env variable to be used by distributions that use these profiles to provide the runtime
        classpath for Spark jobs and daemons.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
      
      82eb688 [Marcelo Vanzin] Add a comment.
      eb228c0 [Marcelo Vanzin] Fix borked merge.
      4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
      371ebee [Marcelo Vanzin] Review feedback.
      52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      322f882 [Marcelo Vanzin] Fix merge fail.
      f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9640503 [Marcelo Vanzin] Cleanup child process log message.
      115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
      e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
      7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
      1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
      d1399ed [Marcelo Vanzin] Restore jetty dependency.
      82a54b9 [Marcelo Vanzin] Remove unused profile.
      5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
      1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
      f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
      9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
      d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
      4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
      417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
      2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
      1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
      284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
      48cecf67
  25. Jan 06, 2015
    • Sean Owen's avatar
      SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4
      Sean Owen authored
      This PR:
      
      - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
      - Tells `surefire` to test only Java tests
      - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
      
      For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
      
      2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
      12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
      e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
      4cba6eb4
  26. Dec 30, 2014
  27. Dec 27, 2014
    • Jongyoul Lee's avatar
      [SPARK-3955] Different versions between jackson-mapper-asl and jackson-c... · 2483c1ef
      Jongyoul Lee authored
      ...ore-asl
      
      - set the same version to jackson-mapper-asl and jackson-core-asl
      - It's related with #2818
      - coded a same patch from a latest master
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3716 from jongyoul/SPARK-3955 and squashes the following commits:
      
      efa29aa [Jongyoul Lee] [SPARK-3955] Different versions between jackson-mapper-asl and jackson-core-asl - set the same version to jackson-mapper-asl and jackson-core-asl
      2483c1ef
  28. Dec 23, 2014
    • Cheng Lian's avatar
      [SPARK-4914][Build] Cleans lib_managed before compiling with Hive 0.13.1 · 395b771f
      Cheng Lian authored
      This PR tries to fix the Hive tests failure encountered in PR #3157 by cleaning `lib_managed` before building assembly jar against Hive 0.13.1 in `dev/run-tests`. Otherwise two sets of datanucleus jars would be left in `lib_managed` and may mess up class paths while executing Hive test suites. Please refer to [this thread] [1] for details. A clean build would be even safer, but we only clean `lib_managed` here to save build time.
      
      This PR also takes the chance to clean up some minor typos and formatting issues in the comments.
      
      [1]: https://github.com/apache/spark/pull/3157#issuecomment-67656488
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3756)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #3756 from liancheng/clean-lib-managed and squashes the following commits:
      
      e2bd21d [Cheng Lian] Adds lib_managed to clean set
      c9f2f3e [Cheng Lian] Cleans lib_managed before compiling with Hive 0.13.1
      395b771f
  29. Dec 19, 2014
    • scwf's avatar
      [Build] Remove spark-staging-1038 · 8e253ebb
      scwf authored
      Author: scwf <wangfei1@huawei.com>
      
      Closes #3743 from scwf/abc and squashes the following commits:
      
      7d98bc8 [scwf] removing spark-staging-1038
      8e253ebb
  30. Dec 15, 2014
    • Sean Owen's avatar
      SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from... · 81112e4b
      Sean Owen authored
      SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from Hive's LazyBinaryInteger
      
      This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3692 from srowen/SPARK-4814 and squashes the following commits:
      
      caca704 [Sean Owen] Disable assertions just for Hive
      f71e783 [Sean Owen] Enable assertions for SBT and Maven build
      81112e4b
    • Ryan Williams's avatar
      [SPARK-4668] Fix some documentation typos. · 8176b7a0
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #3523 from ryan-williams/tweaks and squashes the following commits:
      
      d2eddaa [Ryan Williams] code review feedback
      ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
      c6cfad9 [Ryan Williams] remove unnecessary if statement
      b74ea35 [Ryan Williams] comment fix
      b0221f0 [Ryan Williams] fix a gendered pronoun
      c71ffed [Ryan Williams] use names on a few boolean parameters
      89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
      e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
      83e8358 [Ryan Williams] fix pom.xml typo
      dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md
      8176b7a0
  31. Dec 09, 2014
    • Sandy Ryza's avatar
      SPARK-4338. [YARN] Ditch yarn-alpha. · 912563aa
      Sandy Ryza authored
      Sorry if this is a little premature with 1.2 still not out the door, but it will make other work like SPARK-4136 and SPARK-2089 a lot easier.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3215 from sryza/sandy-spark-4338 and squashes the following commits:
      
      1c5ac08 [Sandy Ryza] Update building Spark docs and remove unnecessary newline
      9c1421c [Sandy Ryza] SPARK-4338. Ditch yarn-alpha.
      912563aa
  32. Nov 28, 2014
    • Takuya UESHIN's avatar
      [SPARK-4193][BUILD] Disable doclint in Java 8 to prevent from build error. · e464f0ac
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #3058 from ueshin/issues/SPARK-4193 and squashes the following commits:
      
      e096bb1 [Takuya UESHIN] Add a plugin declaration to pluginManagement.
      6762ec2 [Takuya UESHIN] Fix usage of -Xdoclint javadoc option.
      fdb280a [Takuya UESHIN] Fix Javadoc errors.
      4745f3c [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4193
      923e2f0 [Takuya UESHIN] Use doclint option `-missing` instead of `none`.
      30d6718 [Takuya UESHIN] Fix Javadoc errors.
      b548017 [Takuya UESHIN] Disable doclint in Java 8 to prevent from build error.
      e464f0ac
Loading