Skip to content
Snippets Groups Projects
  1. Dec 14, 2014
    • Peter Klipfel's avatar
      fixed spelling errors in documentation · 2a2983f7
      Peter Klipfel authored
      changed "form" to "from" in 3 documentation entries for Kafka integration
      
      Author: Peter Klipfel <peter@klipfel.me>
      
      Closes #3691 from peterklipfel/master and squashes the following commits:
      
      0fe7fc5 [Peter Klipfel] fixed spelling errors in documentation
      2a2983f7
  2. Dec 09, 2014
    • zsxwing's avatar
      [SPARK-3154][STREAMING] Replace ConcurrentHashMap with mutable.HashMap and... · bcb5cdad
      zsxwing authored
      [SPARK-3154][STREAMING] Replace ConcurrentHashMap with mutable.HashMap and remove @volatile from 'stopped'
      
      Since `sequenceNumberToProcessor` and `stopped` are both protected by the lock `sequenceNumberToProcessor`, `ConcurrentHashMap` and `volatile` is unnecessary. So this PR updated them accordingly.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3634 from zsxwing/SPARK-3154 and squashes the following commits:
      
      0d087ac [zsxwing] Replace ConcurrentHashMap with mutable.HashMap and remove @volatile from 'stopped'
      bcb5cdad
  3. Nov 30, 2014
    • Prabeesh K's avatar
      [SPARK-4632] version update · 5e7a6dcb
      Prabeesh K authored
      Author: Prabeesh K <prabsmails@gmail.com>
      
      Closes #3495 from prabeesh/master and squashes the following commits:
      
      ab03d50 [Prabeesh K] Update pom.xml
      8c6437e [Prabeesh K] Revert
      e10b40a [Prabeesh K] version update
      dbac9eb [Prabeesh K] Revert
      ec0b1c3 [Prabeesh K] [SPARK-4632] version update
      a835505 [Prabeesh K] [SPARK-4632] version update
      831391b [Prabeesh K]  [SPARK-4632] version update
      5e7a6dcb
  4. Nov 19, 2014
    • Prashant Sharma's avatar
      SPARK-3962 Marked scope as provided for external projects. · 1c938413
      Prashant Sharma authored
      Somehow maven shade plugin is set in infinite loop of creating effective pom.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Prashant Sharma <scrapcodes@gmail.com>
      
      Closes #2959 from ScrapCodes/SPARK-3962/scope-provided and squashes the following commits:
      
      994d1d3 [Prashant Sharma] Fixed failing flume tests
      270b4fb [Prashant Sharma] Removed most of the unused code.
      bb3bbfd [Prashant Sharma] SPARK-3962 Marked scope as provided for external.
      1c938413
  5. Nov 18, 2014
    • Marcelo Vanzin's avatar
      Bumping version to 1.3.0-SNAPSHOT. · 397d3aae
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3277 from vanzin/version-1.3 and squashes the following commits:
      
      7c3c396 [Marcelo Vanzin] Added temp repo to sbt build.
      5f404ff [Marcelo Vanzin] Add another exclusion.
      19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo.
      3c8d705 [Marcelo Vanzin] Workaround for MIMA checks.
      e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.
      397d3aae
  6. Nov 14, 2014
    • jerryshao's avatar
      [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector · 5930f64b
      jerryshao authored
      Add ReliableKafkaReceiver in Kafka connector to prevent data loss if WAL in Spark Streaming is enabled. Details and design doc can be seen in [SPARK-4062](https://issues.apache.org/jira/browse/SPARK-4062).
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #2991 from jerryshao/kafka-refactor and squashes the following commits:
      
      5461f1c [Saisai Shao] Merge pull request #8 from tdas/kafka-refactor3
      eae4ad6 [Tathagata Das] Refectored KafkaStreamSuiteBased to eliminate KafkaTestUtils and made Java more robust.
      fab14c7 [Tathagata Das] minor update.
      149948b [Tathagata Das] Fixed mistake
      14630aa [Tathagata Das] Minor updates.
      d9a452c [Tathagata Das] Minor updates.
      ec2e95e [Tathagata Das] Removed the receiver's locks and essentially reverted to Saisai's original design.
      2a20a01 [jerryshao] Address some comments
      9f636b3 [Saisai Shao] Merge pull request #5 from tdas/kafka-refactor
      b2b2f84 [Tathagata Das] Refactored Kafka receiver logic and Kafka testsuites
      e501b3c [jerryshao] Add Mima excludes
      b798535 [jerryshao] Fix the missed issue
      e5e21c1 [jerryshao] Change to while loop
      ea873e4 [jerryshao] Further address the comments
      98f3d07 [jerryshao] Fix comment style
      4854ee9 [jerryshao] Address all the comments
      96c7a1d [jerryshao] Update the ReliableKafkaReceiver unit test
      8135d31 [jerryshao] Fix flaky test
      a949741 [jerryshao] Address the comments
      16bfe78 [jerryshao] Change the ordering of imports
      0894aef [jerryshao] Add some comments
      77c3e50 [jerryshao] Code refactor and add some unit tests
      dd9aeeb [jerryshao] Initial commit for reliable Kafka receiver
      5930f64b
  7. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
    • jerryshao's avatar
      [SPARK-2492][Streaming] kafkaReceiver minor changes to align with Kafka 0.8 · c8850a3d
      jerryshao authored
      Update the KafkaReceiver's behavior when auto.offset.reset is set.
      
      In Kafka 0.8, `auto.offset.reset` is a hint for out-range offset to seek to the beginning or end of the partition. While in the previous code `auto.offset.reset` is a enforcement to seek to the beginning or end immediately, this is different from Kafka 0.8 defined behavior.
      
      Also deleting extesting ZK metadata in Receiver when multiple consumers are launched will introduce issue as mentioned in [SPARK-2383](https://issues.apache.org/jira/browse/SPARK-2383).
      
      So Here we change to offer user to API to explicitly reset offset before create Kafka stream, while in the meantime keep the same behavior as Kafka 0.8 for parameter `auto.offset.reset`.
      
      @tdas, would you please review this PR? Thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1420 from jerryshao/kafka-fix and squashes the following commits:
      
      d6ae94d [jerryshao] Address the comment to remove the resetOffset() function
      de3a4c8 [jerryshao] Fix compile error
      4a1c3f9 [jerryshao] Doc changes
      b2c1430 [jerryshao] Move offset reset to a helper function to let user explicitly delete ZK metadata by calling this API
      fac8fd6 [jerryshao] Changes to align with Kafka 0.8
      c8850a3d
    • maji2014's avatar
      [SPARK-4295][External]Fix exception in SparkSinkSuite · f8811a56
      maji2014 authored
      Handle exception in SparkSinkSuite, please refer to [SPARK-4295]
      
      Author: maji2014 <maji3@asiainfo.com>
      
      Closes #3177 from maji2014/spark-4295 and squashes the following commits:
      
      312620a [maji2014] change a new statement for spark-4295
      24c3d21 [maji2014] add log4j.properties for SparkSinkSuite and spark-4295
      c807bf6 [maji2014] Fix exception in SparkSinkSuite
      f8811a56
  8. Nov 02, 2014
    • Aaron Davidson's avatar
      [SPARK-4183] Close transport-related resources between SparkContexts · 2ebd1df3
      Aaron Davidson authored
      A leak of event loops may be causing test failures.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3053 from aarondav/leak and squashes the following commits:
      
      e676d18 [Aaron Davidson] Typo!
      8f96475 [Aaron Davidson] Keep original ssc semantics
      7e49f10 [Aaron Davidson] A leak of event loops may be causing test failures.
      2ebd1df3
  9. Oct 24, 2014
    • Josh Rosen's avatar
      [SPARK-4080] Only throw IOException from [write|read][Object|External] · 6c98c29a
      Josh Rosen authored
      If classes implementing Serializable or Externalizable interfaces throw
      exceptions other than IOException or ClassNotFoundException from their
      (de)serialization methods, then this results in an unhelpful
      "IOException: unexpected exception type" rather than the actual exception that
      produced the (de)serialization error.
      
      This patch fixes this by adding a utility method that re-wraps any uncaught
      exceptions in IOException (unless they are already instances of IOException).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits:
      
      cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External].
      6c98c29a
  10. Oct 14, 2014
    • Tathagata Das's avatar
      [SPARK-3912][Streaming] Fixed flakyFlumeStreamSuite · 4d26aca7
      Tathagata Das authored
      @harishreedharan @pwendell
      See JIRA for diagnosis of the problem
      https://issues.apache.org/jira/browse/SPARK-3912
      
      The solution was to reimplement it.
      1. Find a free port (by binding and releasing a server-scoket), and then use that port
      2. Remove thread.sleep()s, instead repeatedly try to create a sender and send data and check whether data was sent. Use eventually() to minimize waiting time.
      3. Check whether all the data was received, without caring about batches.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2773 from tdas/flume-test-fix and squashes the following commits:
      
      93cd7f6 [Tathagata Das] Reimplimented FlumeStreamSuite to be more robust.
      4d26aca7
  11. Oct 01, 2014
    • Reynold Xin's avatar
      [SPARK-3748] Log thread name in unit test logs · 3888ee2f
      Reynold Xin authored
      Thread names are useful for correlating failures.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2600 from rxin/log4j and squashes the following commits:
      
      83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
      3888ee2f
  12. Sep 30, 2014
    • Sean Owen's avatar
      SPARK-3744 [STREAMING] FlumeStreamSuite will fail during port contention · 8764fe36
      Sean Owen authored
      Since it looked quite easy, I took the liberty of making a quick PR that just uses `Utils.startServiceOnPort` to fix this. It works locally for me.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2601 from srowen/SPARK-3744 and squashes the following commits:
      
      ddc9319 [Sean Owen] Avoid port contention in tests by retrying several ports for Flume stream
      8764fe36
  13. Sep 26, 2014
    • Hari Shreedharan's avatar
      [SPARK-3686][STREAMING] Wait for sink to commit the channel before check... · b235e013
      Hari Shreedharan authored
      ...ing for the channel size.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #2531 from harishreedharan/sparksinksuite-fix and squashes the following commits:
      
      30393c1 [Hari Shreedharan] Use more deterministic method to figure out when batches come in.
      6ce9d8b [Hari Shreedharan] [SPARK-3686][STREAMING] Wait for sink to commit the channel before checking for the channel size.
      b235e013
  14. Sep 24, 2014
  15. Sep 06, 2014
  16. Sep 04, 2014
  17. Aug 27, 2014
    • Hari Shreedharan's avatar
      [SPARK-3154][STREAMING] Make FlumePollingInputDStream shutdown cleaner. · 6f671d04
      Hari Shreedharan authored
      Currently lot of errors get thrown from Avro IPC layer when the dstream
      or sink is shutdown. This PR cleans it up. Some refactoring is done in the
      receiver code to put all of the RPC code into a single Try and just recover
      from that. The sink code has also been cleaned up.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #2065 from harishreedharan/clean-flume-shutdown and squashes the following commits:
      
      f93a07c [Hari Shreedharan] Formatting fixes.
      d7427cc [Hari Shreedharan] More fixes!
      a0a8852 [Hari Shreedharan] Fix race condition, hopefully! Minor other changes.
      4c9ed02 [Hari Shreedharan] Remove unneeded list in Callback handler. Other misc changes.
      8fee36f [Hari Shreedharan] Scala-library is required, else maven build fails. Also catch InterruptedException in TxnProcessor.
      445e700 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into clean-flume-shutdown
      87232e0 [Hari Shreedharan] Refactor Flume Input Stream. Clean up code, better error handling.
      9001d26 [Hari Shreedharan] Change log level to debug in TransactionProcessor#shutdown method
      e7b8d82 [Hari Shreedharan] Incorporate review feedback
      598efa7 [Hari Shreedharan] Clean up some exception handling code
      e1027c6 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into clean-flume-shutdown
      ed608c8 [Hari Shreedharan] [SPARK-3154][STREAMING] Make FlumePollingInputDStream shutdown cleaner.
      6f671d04
  18. Aug 25, 2014
    • Sean Owen's avatar
      SPARK-2798 [BUILD] Correct several small errors in Flume module pom.xml files · cd30db56
      Sean Owen authored
      (EDIT) Since the scalatest issue was since resolved, this is now about a few small problems in the Flume Sink `pom.xml`
      
      - `scalatest` is not declared as a test-scope dependency
      - Its Avro version doesn't match the rest of the build
      - Its Flume version is not synced with the other Flume module
      - The other Flume module declares its dependency on Flume Sink slightly incorrectly, hard-coding the Scala 2.10 version
      - It depends on Scala Lang directly, which it shouldn't
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1726 from srowen/SPARK-2798 and squashes the following commits:
      
      a46e2c6 [Sean Owen] scalatest to test scope, harmonize Avro and Flume versions, remove direct Scala dependency, fix '2.10' in Flume dependency
      cd30db56
  19. Aug 22, 2014
    • Tathagata Das's avatar
      [SPARK-3169] Removed dependency on spark streaming test from spark flume sink · 30040741
      Tathagata Das authored
      Due to maven bug https://jira.codehaus.org/browse/MNG-1378, maven could not resolve spark streaming classes required by the spark-streaming test-jar dependency of external/flume-sink. There is no particular reason that the external/flume-sink has to depend on Spark Streaming at all, so I am eliminating this dependency. Also I have removed the exclusions present in the Flume dependencies, as there is no reason to exclude them (they were excluded in the external/flume module to prevent dependency collisions with Spark).
      
      Since Jenkins will test the sbt build and the unit test, I only tested maven compilation locally.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2101 from tdas/spark-sink-pom-fix and squashes the following commits:
      
      8f42621 [Tathagata Das] Added Flume sink exclusions back, and added netty to test dependencies
      93b559f [Tathagata Das] Removed dependency on spark streaming test from spark flume sink
      30040741
  20. Aug 20, 2014
    • Hari Shreedharan's avatar
      [SPARK-3054][STREAMING] Add unit tests for Spark Sink. · 8c5a2226
      Hari Shreedharan authored
      This patch adds unit tests for Spark Sink.
      
      It also removes the private[flume] for Spark Sink,
      since the sink is instantiated from Flume configuration (looks like this is ignored by reflection which is used by
      Flume, but we should still remove it anyway).
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      Author: Hari Shreedharan <hshreedharan@cloudera.com>
      
      Closes #1958 from harishreedharan/spark-sink-test and squashes the following commits:
      
      e3110b9 [Hari Shreedharan] Add a sleep to allow sink to commit the transactions
      120b81e [Hari Shreedharan] Fix complexity in threading model in test
      4df5be6 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
      c9190d1 [Hari Shreedharan] Indentation and spaces changes
      7fedc5a [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
      abc20cb [Hari Shreedharan] Minor test changes
      7b9b649 [Hari Shreedharan] Merge branch 'master' into spark-sink-test
      f2c56c9 [Hari Shreedharan] Update SparkSinkSuite.scala
      a24aac8 [Hari Shreedharan] Remove unused var
      c86d615 [Hari Shreedharan] [SPARK-3054][STREAMING] Add unit tests for Spark Sink.
      8c5a2226
  21. Aug 17, 2014
    • Hari Shreedharan's avatar
      [HOTFIX][STREAMING] Allow the JVM/Netty to decide which port to bind to in Flume Polling Tests. · 95470a03
      Hari Shreedharan authored
      Author: Hari Shreedharan <harishreedharan@gmail.com>
      
      Closes #1820 from harishreedharan/use-free-ports and squashes the following commits:
      
      b939067 [Hari Shreedharan] Remove unused import.
      67856a8 [Hari Shreedharan] Remove findFreePort.
      0ea51d1 [Hari Shreedharan] Make some changes to getPort to use map on the serverOpt.
      1fb0283 [Hari Shreedharan] Merge branch 'master' of https://github.com/apache/spark into use-free-ports
      b351651 [Hari Shreedharan] Allow Netty to choose port, and query it to decide the port to bind to. Leaving findFreePort as is, if other tests want to use it at some point.
      e6c9620 [Hari Shreedharan] Making sure the second sink uses the correct port.
      11c340d [Hari Shreedharan] Add info about race condition to scaladoc.
      e89d135 [Hari Shreedharan] Adding Scaladoc.
      6013bb0 [Hari Shreedharan] [STREAMING] Find free ports to use before attempting to create Flume Sink in Flume Polling Suite
      95470a03
  22. Aug 06, 2014
    • Andrew Or's avatar
      [HOTFIX][Streaming] Handle port collisions in flume polling test · c6889d2c
      Andrew Or authored
      This is failing my tests in #1777. @tdas
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1803 from andrewor14/fix-flaky-streaming-test and squashes the following commits:
      
      ea11a03 [Andrew Or] Catch all exceptions caused by BindExceptions
      54a0ca0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-flaky-streaming-test
      664095c [Andrew Or] Tone down bind exception message
      af3ddc9 [Andrew Or] Handle port collisions in flume polling test
      c6889d2c
    • Tathagata Das's avatar
      [SPARK-1022][Streaming][HOTFIX] Fixed zookeeper dependency of Kafka · ee7f3085
      Tathagata Das authored
      https://github.com/apache/spark/pull/1751 caused maven builds to fail.
      
      ```
      ~/Apache/spark(branch-1.1|:heavy_check_mark:) ➤ mvn -U -DskipTests clean install
      .
      .
      .
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:36: object NIOServerCnxnFactory is not a member of package org.apache.zookeeper.server
      [error] import org.apache.zookeeper.server.NIOServerCnxnFactory
      [error]        ^
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:199: not found: type NIOServerCnxnFactory
      [error]     val factory = new NIOServerCnxnFactory()
      [error]                       ^
      [error] two errors found
      [error] Compile failed at Aug 5, 2014 1:42:36 PM [0.503s]
      ```
      
      The problem is how SBT and Maven resolves multiple versions of the same library, which in this case, is Zookeeper. Observing and comparing the dependency trees from Maven and SBT showed this. Spark depends on ZK 3.4.5 whereas Apache Kafka transitively depends on upon ZK 3.3.4. SBT decides to evict 3.3.4 and use the higher version 3.4.5. But Maven decides to stick to the closest (in the tree) dependent version of 3.3.4. And 3.3.4 does not have NIOServerCnxnFactory.
      
      The solution in this patch excludes zookeeper from the apache-kafka dependency in streaming-kafka module so that it just inherits zookeeper from Spark core.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1797 from tdas/kafka-zk-fix and squashes the following commits:
      
      94b3931 [Tathagata Das] Fixed zookeeper dependency of Kafka
      ee7f3085
  23. Aug 05, 2014
    • jerryshao's avatar
      [SPARK-1022][Streaming] Add Kafka real unit test · e87075df
      jerryshao authored
      This PR is a updated version of (https://github.com/apache/spark/pull/557) to actually test sending and receiving data through Kafka, and fix previous flaky issues.
      
      @tdas, would you mind reviewing this PR? Thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits:
      
      b6a505f [jerryshao] code refactor according to comments
      5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it
      5525f10 [jerryshao] Fix flaky issue of Kafka real unit test
      4559310 [jerryshao] Minor changes for Kafka unit test
      860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness
      796d4ca [jerryshao] Add real Kafka streaming test
      e87075df
  24. Aug 02, 2014
  25. Aug 01, 2014
    • jerryshao's avatar
      [SPARK-2103][Streaming] Change to ClassTag for KafkaInputDStream and fix reflection issue · a32f0fb7
      jerryshao authored
      This PR updates previous Manifest for KafkaInputDStream's Decoder to ClassTag, also fix the problem addressed in [SPARK-2103](https://issues.apache.org/jira/browse/SPARK-2103).
      
      Previous Java interface cannot actually get the type of Decoder, so when using this Manifest to reconstruct the decode object will meet reflection exception.
      
      Also for other two Java interfaces, ClassTag[String] is useless because calling Scala API will get the right implicit ClassTag.
      
      Current Kafka unit test cannot actually verify the interface. I've tested these interfaces in my local and distribute settings.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1508 from jerryshao/SPARK-2103 and squashes the following commits:
      
      e90c37b [jerryshao] Add Mima excludes
      7529810 [jerryshao] Change Manifest to ClassTag for KafkaInputDStream's Decoder and fix Decoder construct issue when using Java API
      a32f0fb7
  26. Jul 30, 2014
    • Sean Owen's avatar
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven... · 6ab96a6f
      Sean Owen authored
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep
      
      The Maven-based builds in the build matrix have been failing for a few days:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/
      
      On inspection, it looks like the Spark SQL Java tests don't compile:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/244/consoleFull
      
      I confirmed it by repeating the command vs master:
      
      `mvn -Dhadoop.version=1.0.4 -Dlabel=centos -DskipTests clean package`
      
      The problem is that this module doesn't depend on JUnit. In fact, none of the modules do, but `com.novocode:junit-interface` (the SBT-JUnit bridge) pulls it in, in most places. However this module doesn't depend on `com.novocode:junit-interface`
      
      Adding the `junit:junit` dependency fixes the compile problem. In fact, the other modules with Java tests should probably depend on it explicitly instead of happening to get it via `com.novocode:junit-interface`, since that is a bit SBT/Scala-specific (and I am not even sure it's needed).
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1660 from srowen/SPARK-2749 and squashes the following commits:
      
      858ff7c [Sean Owen] Add explicit junit dep to other modules with Java tests for robustness
      9636794 [Sean Owen] Add junit dep so that Spark SQL Java tests compile
      6ab96a6f
  27. Jul 29, 2014
    • Hari Shreedharan's avatar
      [STREAMING] SPARK-1729. Make Flume pull data from source, rather than the current pu... · 800ecff4
      Hari Shreedharan authored
      ...sh model
      
      Currently Spark uses Flume's internal Avro Protocol to ingest data from Flume. If the executor running the
      receiver fails, it currently has to be restarted on the same node to be able to receive data.
      
      This commit adds a new Sink which can be deployed to a Flume agent. This sink can be polled by a new
      DStream that is also included in this commit. This model ensures that data can be pulled into Spark from
      Flume even if the receiver is restarted on a new node. This also allows the receiver to receive data on
      multiple threads for better performance.
      
      Author: Hari Shreedharan <harishreedharan@gmail.com>
      Author: Hari Shreedharan <hshreedharan@apache.org>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: harishreedharan <hshreedharan@cloudera.com>
      
      Closes #807 from harishreedharan/master and squashes the following commits:
      
      e7f70a3 [Hari Shreedharan] Merge remote-tracking branch 'asf-git/master'
      96cfb6f [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      e48d785 [Hari Shreedharan] Documenting flume-sink being ignored for Mima checks.
      5f212ce [Hari Shreedharan] Ignore Spark Sink from mima.
      981bf62 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      7a1bc6e [Hari Shreedharan] Fix SparkBuild.scala
      a082eb3 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      1f47364 [Hari Shreedharan] Minor fixes.
      73d6f6d [Hari Shreedharan] Cleaned up tests a bit. Added some docs in multiple places.
      65b76b4 [Hari Shreedharan] Fixing the unit test.
      e59cc20 [Hari Shreedharan] Use SparkFlumeEvent instead of the new type. Also, Flume Polling Receiver now uses the store(ArrayBuffer) method.
      f3c99d1 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      3572180 [Hari Shreedharan] Adding a license header, making Jenkins happy.
      799509f [Hari Shreedharan] Fix a compile issue.
      3c5194c [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      d248d22 [harishreedharan] Merge pull request #1 from tdas/flume-polling
      10b6214 [Tathagata Das] Changed public API, changed sink package, and added java unit test to make sure Java API is callable from Java.
      1edc806 [Hari Shreedharan] SPARK-1729. Update logging in Spark Sink.
      8c00289 [Hari Shreedharan] More debug messages
      393bd94 [Hari Shreedharan] SPARK-1729. Use LinkedBlockingQueue instead of ArrayBuffer to keep track of connections.
      120e2a1 [Hari Shreedharan] SPARK-1729. Some test changes and changes to utils classes.
      9fd0da7 [Hari Shreedharan] SPARK-1729. Use foreach instead of map for all Options.
      8136aa6 [Hari Shreedharan] Adding TransactionProcessor to map on returning batch of data
      86aa274 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      205034d [Hari Shreedharan] Merging master in
      4b0c7fc [Hari Shreedharan] FLUME-1729. New Flume-Spark integration.
      bda01fc [Hari Shreedharan] FLUME-1729. Flume-Spark integration.
      0d69604 [Hari Shreedharan] FLUME-1729. Better Flume-Spark integration.
      3c23c18 [Hari Shreedharan] SPARK-1729. New Spark-Flume integration.
      70bcc2a [Hari Shreedharan] SPARK-1729. New Flume-Spark integration.
      d6fa3aa [Hari Shreedharan] SPARK-1729. New Flume-Spark integration.
      e7da512 [Hari Shreedharan] SPARK-1729. Fixing import order
      9741683 [Hari Shreedharan] SPARK-1729. Fixes based on review.
      c604a3c [Hari Shreedharan] SPARK-1729. Optimize imports.
      0f10788 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      87775aa [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      8df37e4 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      03d6c1c [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      08176ad [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      d24d9d4 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      6d6776a [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      800ecff4
  28. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  29. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  30. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  31. Jul 24, 2014
    • Tathagata Das's avatar
      [SPARK-2464][Streaming] Fixed Twitter stream stopping bug · a45d5480
      Tathagata Das authored
      Stopping the Twitter Receiver would call twitter4j's TwitterStream.shutdown, which in turn causes an Exception to be thrown to the listener. This exception caused the Receiver to be restarted. This patch check whether the receiver was stopped or not, and accordingly restarts on exception.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1577 from tdas/twitter-stop and squashes the following commits:
      
      011b525 [Tathagata Das] Fixed Twitter stream stopping bug.
      a45d5480
  32. Jul 17, 2014
    • Sean Owen's avatar
      SPARK-1478.2 Fix incorrect NioServerSocketChannelFactory constructor call · 1fcd5dcd
      Sean Owen authored
      The line break inadvertently means this was interpreted as a call to the no-arg constructor. This doesn't exist in older Netty even. (Also fixed a val name typo.)
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1466 from srowen/SPARK-1478.2 and squashes the following commits:
      
      59c3501 [Sean Owen] Line break caused Scala to interpret NioServerSocketChannelFactory constructor as the no-arg version, which is not even present in some versions of Netty
      1fcd5dcd
  33. Jul 10, 2014
    • tmalaska's avatar
      [SPARK-1478].3: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 · 40a8fef4
      tmalaska authored
      This is a modified version of this PR https://github.com/apache/spark/pull/1168 done by @tmalaska
      Adds MIMA binary check exclusions.
      
      Author: tmalaska <ted.malaska@cloudera.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1347 from tdas/FLUME-1915 and squashes the following commits:
      
      96065df [Tathagata Das] Added Mima exclusion for FlumeReceiver.
      41d5338 [tmalaska] Address line 57 that was too long
      12617e5 [tmalaska] SPARK-1478: Upgrade FlumeInputDStream's Flume...
      40a8fef4
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
Loading