Skip to content
Snippets Groups Projects
  1. Oct 24, 2014
    • Josh Rosen's avatar
      [SPARK-4080] Only throw IOException from [write|read][Object|External] · 6c98c29a
      Josh Rosen authored
      If classes implementing Serializable or Externalizable interfaces throw
      exceptions other than IOException or ClassNotFoundException from their
      (de)serialization methods, then this results in an unhelpful
      "IOException: unexpected exception type" rather than the actual exception that
      produced the (de)serialization error.
      
      This patch fixes this by adding a utility method that re-wraps any uncaught
      exceptions in IOException (unless they are already instances of IOException).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits:
      
      cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External].
      6c98c29a
  2. Oct 14, 2014
    • Tathagata Das's avatar
      [SPARK-3912][Streaming] Fixed flakyFlumeStreamSuite · 4d26aca7
      Tathagata Das authored
      @harishreedharan @pwendell
      See JIRA for diagnosis of the problem
      https://issues.apache.org/jira/browse/SPARK-3912
      
      The solution was to reimplement it.
      1. Find a free port (by binding and releasing a server-scoket), and then use that port
      2. Remove thread.sleep()s, instead repeatedly try to create a sender and send data and check whether data was sent. Use eventually() to minimize waiting time.
      3. Check whether all the data was received, without caring about batches.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2773 from tdas/flume-test-fix and squashes the following commits:
      
      93cd7f6 [Tathagata Das] Reimplimented FlumeStreamSuite to be more robust.
      4d26aca7
  3. Oct 01, 2014
    • Reynold Xin's avatar
      [SPARK-3748] Log thread name in unit test logs · 3888ee2f
      Reynold Xin authored
      Thread names are useful for correlating failures.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2600 from rxin/log4j and squashes the following commits:
      
      83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
      3888ee2f
  4. Sep 30, 2014
    • Sean Owen's avatar
      SPARK-3744 [STREAMING] FlumeStreamSuite will fail during port contention · 8764fe36
      Sean Owen authored
      Since it looked quite easy, I took the liberty of making a quick PR that just uses `Utils.startServiceOnPort` to fix this. It works locally for me.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2601 from srowen/SPARK-3744 and squashes the following commits:
      
      ddc9319 [Sean Owen] Avoid port contention in tests by retrying several ports for Flume stream
      8764fe36
  5. Sep 26, 2014
    • Hari Shreedharan's avatar
      [SPARK-3686][STREAMING] Wait for sink to commit the channel before check... · b235e013
      Hari Shreedharan authored
      ...ing for the channel size.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #2531 from harishreedharan/sparksinksuite-fix and squashes the following commits:
      
      30393c1 [Hari Shreedharan] Use more deterministic method to figure out when batches come in.
      6ce9d8b [Hari Shreedharan] [SPARK-3686][STREAMING] Wait for sink to commit the channel before checking for the channel size.
      b235e013
  6. Sep 24, 2014
  7. Sep 06, 2014
  8. Sep 04, 2014
  9. Aug 27, 2014
    • Hari Shreedharan's avatar
      [SPARK-3154][STREAMING] Make FlumePollingInputDStream shutdown cleaner. · 6f671d04
      Hari Shreedharan authored
      Currently lot of errors get thrown from Avro IPC layer when the dstream
      or sink is shutdown. This PR cleans it up. Some refactoring is done in the
      receiver code to put all of the RPC code into a single Try and just recover
      from that. The sink code has also been cleaned up.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #2065 from harishreedharan/clean-flume-shutdown and squashes the following commits:
      
      f93a07c [Hari Shreedharan] Formatting fixes.
      d7427cc [Hari Shreedharan] More fixes!
      a0a8852 [Hari Shreedharan] Fix race condition, hopefully! Minor other changes.
      4c9ed02 [Hari Shreedharan] Remove unneeded list in Callback handler. Other misc changes.
      8fee36f [Hari Shreedharan] Scala-library is required, else maven build fails. Also catch InterruptedException in TxnProcessor.
      445e700 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into clean-flume-shutdown
      87232e0 [Hari Shreedharan] Refactor Flume Input Stream. Clean up code, better error handling.
      9001d26 [Hari Shreedharan] Change log level to debug in TransactionProcessor#shutdown method
      e7b8d82 [Hari Shreedharan] Incorporate review feedback
      598efa7 [Hari Shreedharan] Clean up some exception handling code
      e1027c6 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into clean-flume-shutdown
      ed608c8 [Hari Shreedharan] [SPARK-3154][STREAMING] Make FlumePollingInputDStream shutdown cleaner.
      6f671d04
  10. Aug 25, 2014
    • Sean Owen's avatar
      SPARK-2798 [BUILD] Correct several small errors in Flume module pom.xml files · cd30db56
      Sean Owen authored
      (EDIT) Since the scalatest issue was since resolved, this is now about a few small problems in the Flume Sink `pom.xml`
      
      - `scalatest` is not declared as a test-scope dependency
      - Its Avro version doesn't match the rest of the build
      - Its Flume version is not synced with the other Flume module
      - The other Flume module declares its dependency on Flume Sink slightly incorrectly, hard-coding the Scala 2.10 version
      - It depends on Scala Lang directly, which it shouldn't
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1726 from srowen/SPARK-2798 and squashes the following commits:
      
      a46e2c6 [Sean Owen] scalatest to test scope, harmonize Avro and Flume versions, remove direct Scala dependency, fix '2.10' in Flume dependency
      cd30db56
  11. Aug 22, 2014
    • Tathagata Das's avatar
      [SPARK-3169] Removed dependency on spark streaming test from spark flume sink · 30040741
      Tathagata Das authored
      Due to maven bug https://jira.codehaus.org/browse/MNG-1378, maven could not resolve spark streaming classes required by the spark-streaming test-jar dependency of external/flume-sink. There is no particular reason that the external/flume-sink has to depend on Spark Streaming at all, so I am eliminating this dependency. Also I have removed the exclusions present in the Flume dependencies, as there is no reason to exclude them (they were excluded in the external/flume module to prevent dependency collisions with Spark).
      
      Since Jenkins will test the sbt build and the unit test, I only tested maven compilation locally.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2101 from tdas/spark-sink-pom-fix and squashes the following commits:
      
      8f42621 [Tathagata Das] Added Flume sink exclusions back, and added netty to test dependencies
      93b559f [Tathagata Das] Removed dependency on spark streaming test from spark flume sink
      30040741
  12. Aug 20, 2014
    • Hari Shreedharan's avatar
      [SPARK-3054][STREAMING] Add unit tests for Spark Sink. · 8c5a2226
      Hari Shreedharan authored
      This patch adds unit tests for Spark Sink.
      
      It also removes the private[flume] for Spark Sink,
      since the sink is instantiated from Flume configuration (looks like this is ignored by reflection which is used by
      Flume, but we should still remove it anyway).
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      Author: Hari Shreedharan <hshreedharan@cloudera.com>
      
      Closes #1958 from harishreedharan/spark-sink-test and squashes the following commits:
      
      e3110b9 [Hari Shreedharan] Add a sleep to allow sink to commit the transactions
      120b81e [Hari Shreedharan] Fix complexity in threading model in test
      4df5be6 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
      c9190d1 [Hari Shreedharan] Indentation and spaces changes
      7fedc5a [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
      abc20cb [Hari Shreedharan] Minor test changes
      7b9b649 [Hari Shreedharan] Merge branch 'master' into spark-sink-test
      f2c56c9 [Hari Shreedharan] Update SparkSinkSuite.scala
      a24aac8 [Hari Shreedharan] Remove unused var
      c86d615 [Hari Shreedharan] [SPARK-3054][STREAMING] Add unit tests for Spark Sink.
      8c5a2226
  13. Aug 17, 2014
    • Hari Shreedharan's avatar
      [HOTFIX][STREAMING] Allow the JVM/Netty to decide which port to bind to in Flume Polling Tests. · 95470a03
      Hari Shreedharan authored
      Author: Hari Shreedharan <harishreedharan@gmail.com>
      
      Closes #1820 from harishreedharan/use-free-ports and squashes the following commits:
      
      b939067 [Hari Shreedharan] Remove unused import.
      67856a8 [Hari Shreedharan] Remove findFreePort.
      0ea51d1 [Hari Shreedharan] Make some changes to getPort to use map on the serverOpt.
      1fb0283 [Hari Shreedharan] Merge branch 'master' of https://github.com/apache/spark into use-free-ports
      b351651 [Hari Shreedharan] Allow Netty to choose port, and query it to decide the port to bind to. Leaving findFreePort as is, if other tests want to use it at some point.
      e6c9620 [Hari Shreedharan] Making sure the second sink uses the correct port.
      11c340d [Hari Shreedharan] Add info about race condition to scaladoc.
      e89d135 [Hari Shreedharan] Adding Scaladoc.
      6013bb0 [Hari Shreedharan] [STREAMING] Find free ports to use before attempting to create Flume Sink in Flume Polling Suite
      95470a03
  14. Aug 06, 2014
    • Andrew Or's avatar
      [HOTFIX][Streaming] Handle port collisions in flume polling test · c6889d2c
      Andrew Or authored
      This is failing my tests in #1777. @tdas
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1803 from andrewor14/fix-flaky-streaming-test and squashes the following commits:
      
      ea11a03 [Andrew Or] Catch all exceptions caused by BindExceptions
      54a0ca0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-flaky-streaming-test
      664095c [Andrew Or] Tone down bind exception message
      af3ddc9 [Andrew Or] Handle port collisions in flume polling test
      c6889d2c
    • Tathagata Das's avatar
      [SPARK-1022][Streaming][HOTFIX] Fixed zookeeper dependency of Kafka · ee7f3085
      Tathagata Das authored
      https://github.com/apache/spark/pull/1751 caused maven builds to fail.
      
      ```
      ~/Apache/spark(branch-1.1|:heavy_check_mark:) ➤ mvn -U -DskipTests clean install
      .
      .
      .
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:36: object NIOServerCnxnFactory is not a member of package org.apache.zookeeper.server
      [error] import org.apache.zookeeper.server.NIOServerCnxnFactory
      [error]        ^
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:199: not found: type NIOServerCnxnFactory
      [error]     val factory = new NIOServerCnxnFactory()
      [error]                       ^
      [error] two errors found
      [error] Compile failed at Aug 5, 2014 1:42:36 PM [0.503s]
      ```
      
      The problem is how SBT and Maven resolves multiple versions of the same library, which in this case, is Zookeeper. Observing and comparing the dependency trees from Maven and SBT showed this. Spark depends on ZK 3.4.5 whereas Apache Kafka transitively depends on upon ZK 3.3.4. SBT decides to evict 3.3.4 and use the higher version 3.4.5. But Maven decides to stick to the closest (in the tree) dependent version of 3.3.4. And 3.3.4 does not have NIOServerCnxnFactory.
      
      The solution in this patch excludes zookeeper from the apache-kafka dependency in streaming-kafka module so that it just inherits zookeeper from Spark core.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1797 from tdas/kafka-zk-fix and squashes the following commits:
      
      94b3931 [Tathagata Das] Fixed zookeeper dependency of Kafka
      ee7f3085
  15. Aug 05, 2014
    • jerryshao's avatar
      [SPARK-1022][Streaming] Add Kafka real unit test · e87075df
      jerryshao authored
      This PR is a updated version of (https://github.com/apache/spark/pull/557) to actually test sending and receiving data through Kafka, and fix previous flaky issues.
      
      @tdas, would you mind reviewing this PR? Thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits:
      
      b6a505f [jerryshao] code refactor according to comments
      5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it
      5525f10 [jerryshao] Fix flaky issue of Kafka real unit test
      4559310 [jerryshao] Minor changes for Kafka unit test
      860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness
      796d4ca [jerryshao] Add real Kafka streaming test
      e87075df
  16. Aug 02, 2014
  17. Aug 01, 2014
    • jerryshao's avatar
      [SPARK-2103][Streaming] Change to ClassTag for KafkaInputDStream and fix reflection issue · a32f0fb7
      jerryshao authored
      This PR updates previous Manifest for KafkaInputDStream's Decoder to ClassTag, also fix the problem addressed in [SPARK-2103](https://issues.apache.org/jira/browse/SPARK-2103).
      
      Previous Java interface cannot actually get the type of Decoder, so when using this Manifest to reconstruct the decode object will meet reflection exception.
      
      Also for other two Java interfaces, ClassTag[String] is useless because calling Scala API will get the right implicit ClassTag.
      
      Current Kafka unit test cannot actually verify the interface. I've tested these interfaces in my local and distribute settings.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1508 from jerryshao/SPARK-2103 and squashes the following commits:
      
      e90c37b [jerryshao] Add Mima excludes
      7529810 [jerryshao] Change Manifest to ClassTag for KafkaInputDStream's Decoder and fix Decoder construct issue when using Java API
      a32f0fb7
  18. Jul 30, 2014
    • Sean Owen's avatar
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven... · 6ab96a6f
      Sean Owen authored
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep
      
      The Maven-based builds in the build matrix have been failing for a few days:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/
      
      On inspection, it looks like the Spark SQL Java tests don't compile:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/244/consoleFull
      
      I confirmed it by repeating the command vs master:
      
      `mvn -Dhadoop.version=1.0.4 -Dlabel=centos -DskipTests clean package`
      
      The problem is that this module doesn't depend on JUnit. In fact, none of the modules do, but `com.novocode:junit-interface` (the SBT-JUnit bridge) pulls it in, in most places. However this module doesn't depend on `com.novocode:junit-interface`
      
      Adding the `junit:junit` dependency fixes the compile problem. In fact, the other modules with Java tests should probably depend on it explicitly instead of happening to get it via `com.novocode:junit-interface`, since that is a bit SBT/Scala-specific (and I am not even sure it's needed).
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1660 from srowen/SPARK-2749 and squashes the following commits:
      
      858ff7c [Sean Owen] Add explicit junit dep to other modules with Java tests for robustness
      9636794 [Sean Owen] Add junit dep so that Spark SQL Java tests compile
      6ab96a6f
  19. Jul 29, 2014
    • Hari Shreedharan's avatar
      [STREAMING] SPARK-1729. Make Flume pull data from source, rather than the current pu... · 800ecff4
      Hari Shreedharan authored
      ...sh model
      
      Currently Spark uses Flume's internal Avro Protocol to ingest data from Flume. If the executor running the
      receiver fails, it currently has to be restarted on the same node to be able to receive data.
      
      This commit adds a new Sink which can be deployed to a Flume agent. This sink can be polled by a new
      DStream that is also included in this commit. This model ensures that data can be pulled into Spark from
      Flume even if the receiver is restarted on a new node. This also allows the receiver to receive data on
      multiple threads for better performance.
      
      Author: Hari Shreedharan <harishreedharan@gmail.com>
      Author: Hari Shreedharan <hshreedharan@apache.org>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: harishreedharan <hshreedharan@cloudera.com>
      
      Closes #807 from harishreedharan/master and squashes the following commits:
      
      e7f70a3 [Hari Shreedharan] Merge remote-tracking branch 'asf-git/master'
      96cfb6f [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      e48d785 [Hari Shreedharan] Documenting flume-sink being ignored for Mima checks.
      5f212ce [Hari Shreedharan] Ignore Spark Sink from mima.
      981bf62 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      7a1bc6e [Hari Shreedharan] Fix SparkBuild.scala
      a082eb3 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      1f47364 [Hari Shreedharan] Minor fixes.
      73d6f6d [Hari Shreedharan] Cleaned up tests a bit. Added some docs in multiple places.
      65b76b4 [Hari Shreedharan] Fixing the unit test.
      e59cc20 [Hari Shreedharan] Use SparkFlumeEvent instead of the new type. Also, Flume Polling Receiver now uses the store(ArrayBuffer) method.
      f3c99d1 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      3572180 [Hari Shreedharan] Adding a license header, making Jenkins happy.
      799509f [Hari Shreedharan] Fix a compile issue.
      3c5194c [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      d248d22 [harishreedharan] Merge pull request #1 from tdas/flume-polling
      10b6214 [Tathagata Das] Changed public API, changed sink package, and added java unit test to make sure Java API is callable from Java.
      1edc806 [Hari Shreedharan] SPARK-1729. Update logging in Spark Sink.
      8c00289 [Hari Shreedharan] More debug messages
      393bd94 [Hari Shreedharan] SPARK-1729. Use LinkedBlockingQueue instead of ArrayBuffer to keep track of connections.
      120e2a1 [Hari Shreedharan] SPARK-1729. Some test changes and changes to utils classes.
      9fd0da7 [Hari Shreedharan] SPARK-1729. Use foreach instead of map for all Options.
      8136aa6 [Hari Shreedharan] Adding TransactionProcessor to map on returning batch of data
      86aa274 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
      205034d [Hari Shreedharan] Merging master in
      4b0c7fc [Hari Shreedharan] FLUME-1729. New Flume-Spark integration.
      bda01fc [Hari Shreedharan] FLUME-1729. Flume-Spark integration.
      0d69604 [Hari Shreedharan] FLUME-1729. Better Flume-Spark integration.
      3c23c18 [Hari Shreedharan] SPARK-1729. New Spark-Flume integration.
      70bcc2a [Hari Shreedharan] SPARK-1729. New Flume-Spark integration.
      d6fa3aa [Hari Shreedharan] SPARK-1729. New Flume-Spark integration.
      e7da512 [Hari Shreedharan] SPARK-1729. Fixing import order
      9741683 [Hari Shreedharan] SPARK-1729. Fixes based on review.
      c604a3c [Hari Shreedharan] SPARK-1729. Optimize imports.
      0f10788 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      87775aa [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      8df37e4 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      03d6c1c [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      08176ad [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      d24d9d4 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      6d6776a [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
      800ecff4
  20. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  21. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  22. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  23. Jul 24, 2014
    • Tathagata Das's avatar
      [SPARK-2464][Streaming] Fixed Twitter stream stopping bug · a45d5480
      Tathagata Das authored
      Stopping the Twitter Receiver would call twitter4j's TwitterStream.shutdown, which in turn causes an Exception to be thrown to the listener. This exception caused the Receiver to be restarted. This patch check whether the receiver was stopped or not, and accordingly restarts on exception.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1577 from tdas/twitter-stop and squashes the following commits:
      
      011b525 [Tathagata Das] Fixed Twitter stream stopping bug.
      a45d5480
  24. Jul 17, 2014
    • Sean Owen's avatar
      SPARK-1478.2 Fix incorrect NioServerSocketChannelFactory constructor call · 1fcd5dcd
      Sean Owen authored
      The line break inadvertently means this was interpreted as a call to the no-arg constructor. This doesn't exist in older Netty even. (Also fixed a val name typo.)
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1466 from srowen/SPARK-1478.2 and squashes the following commits:
      
      59c3501 [Sean Owen] Line break caused Scala to interpret NioServerSocketChannelFactory constructor as the no-arg version, which is not even present in some versions of Netty
      1fcd5dcd
  25. Jul 10, 2014
    • tmalaska's avatar
      [SPARK-1478].3: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 · 40a8fef4
      tmalaska authored
      This is a modified version of this PR https://github.com/apache/spark/pull/1168 done by @tmalaska
      Adds MIMA binary check exclusions.
      
      Author: tmalaska <ted.malaska@cloudera.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1347 from tdas/FLUME-1915 and squashes the following commits:
      
      96065df [Tathagata Das] Added Mima exclusion for FlumeReceiver.
      41d5338 [tmalaska] Address line 57 that was too long
      12617e5 [tmalaska] SPARK-1478: Upgrade FlumeInputDStream's Flume...
      40a8fef4
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  26. Jun 22, 2014
    • Sean Owen's avatar
      SPARK-2034. KafkaInputDStream doesn't close resources and may prevent JVM shutdown · 476581e8
      Sean Owen authored
      Tobias noted today on the mailing list:
      
      ========
      
      I am trying to use Spark Streaming with Kafka, which works like a
      charm – except for shutdown. When I run my program with "sbt
      run-main", sbt will never exit, because there are two non-daemon
      threads left that don't die.
      I created a minimal example at
      <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-kafkadoesntshutdown-scala>.
      It starts a StreamingContext and does nothing more than connecting to
      a Kafka server and printing what it receives. Using the `future
      Unknown macro: { ... }
      ` construct, I shut down the StreamingContext after some seconds and
      then print the difference between the threads at start time and at end
      time. The output can be found at
      <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output1>.
      There are a number of threads remaining that will prevent sbt from
      exiting.
      When I replace `KafkaUtils.createStream(...)` with a call that does
      exactly the same, except that it calls `consumerConnector.shutdown()`
      in `KafkaReceiver.onStop()` (which it should, IMO), the output is as
      shown at <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output2>.
      Does anyone have any idea what is going on here and why the program
      doesn't shut down properly? The behavior is the same with both kafka
      0.8.0 and 0.8.1.1, by the way.
      
      ========
      
      Something similar was noted last year:
      
      http://mail-archives.apache.org/mod_mbox/spark-dev/201309.mbox/%3C1380220041.2428.YahooMailNeo@web160804.mail.bf1.yahoo.com%3E
      
      KafkaInputDStream doesn't close `ConsumerConnector` in `onStop()`, and does not close the `Executor` it creates. The latter leaves non-daemon threads and can prevent the JVM from shutting down even if streaming is closed properly.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #980 from srowen/SPARK-2034 and squashes the following commits:
      
      9f31a8d [Sean Owen] Restore ClassTag to private class because MIMA flags it; is the shadowing intended?
      2d579a8 [Sean Owen] Close ConsumerConnector in onStop; shutdown() the local Executor that is created so that its threads stop when done; close the Zookeeper client even on exception; fix a few typos; log exceptions that otherwise vanish
      476581e8
  27. Jun 10, 2014
    • joyyoj's avatar
      [SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not re... · 29660443
      joyyoj authored
      flume event sent to Spark will fail if the body is too large and numHeaders is greater than zero
      
      Author: joyyoj <sunshch@gmail.com>
      
      Closes #951 from joyyoj/master and squashes the following commits:
      
      f4660c5 [joyyoj] [SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not read properly
      29660443
  28. Jun 05, 2014
  29. May 28, 2014
    • David Lemieux's avatar
      Spark 1916 · 4312cf0b
      David Lemieux authored
      
      The changes could be ported back to 0.9 as well.
      Changing in.read to in.readFully to read the whole input stream rather than the first 1020 bytes.
      This should ok considering that Flume caps the body size to 32K by default.
      
      Author: David Lemieux <david.lemieux@radialpoint.com>
      
      Closes #865 from lemieud/SPARK-1916 and squashes the following commits:
      
      a265673 [David Lemieux] Updated SparkFlumeEvent to read the whole stream rather than the first X bytes.
      (cherry picked from commit 0b769b73)
      
      Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
      4312cf0b
  30. May 15, 2014
    • Prashant Sharma's avatar
      Package docs · 46324279
      Prashant Sharma authored
      This is a few changes based on the original patch by @scrapcodes.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #785 from pwendell/package-docs and squashes the following commits:
      
      c32b731 [Patrick Wendell] Changes based on Prashant's patch
      c0463d3 [Prashant Sharma] added eof new line
      ce8bf73 [Prashant Sharma] Added eof new line to all files.
      4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs
      46324279
  31. May 14, 2014
    • Tathagata Das's avatar
      Fixed streaming examples docs to use run-example instead of spark-submit · 68f28dab
      Tathagata Das authored
      Pretty self-explanatory
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #722 from tdas/example-fix and squashes the following commits:
      
      7839979 [Tathagata Das] Minor changes.
      0673441 [Tathagata Das] Fixed java docs of java streaming example
      e687123 [Tathagata Das] Fixed scala style errors.
      9b8d112 [Tathagata Das] Fixed streaming examples docs to use run-example instead of spark-submit.
      68f28dab
  32. May 12, 2014
    • Sean Owen's avatar
      SPARK-1798. Tests should clean up temp files · 7120a297
      Sean Owen authored
      Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent.
      
      Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former.
      
      The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules.
      
      Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method.
      
      _If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #732 from srowen/SPARK-1798 and squashes the following commits:
      
      5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each
      b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean
      bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests
      7120a297
  33. May 10, 2014
    • Sean Owen's avatar
      SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure · 2b7bd29e
      Sean Owen authored
      TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure.
      
      I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?)
      
      velvia notes:
      "I have found a workaround.  If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty."
      
      There are at least 3 versions of Netty in play in the build:
      
      - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem
      - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6.
      - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final
      
      The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue.
      
      The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final.
      
      But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile.
      
      If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation.
      
      So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict:
      
      - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts
      - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty
      - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent
      - Update SBT build accordingly
      
      A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #723 from srowen/SPARK-1789 and squashes the following commits:
      
      43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues
      2b7bd29e
  34. Apr 29, 2014
    • witgo's avatar
      Improved build configuration · 030f2c21
      witgo authored
      1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
      2, Fix SPARK-1491: maven hadoop-provided profile fails to build
      3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
      4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)
      
      Author: witgo <witgo@qq.com>
      
      Closes #480 from witgo/format_pom and squashes the following commits:
      
      03f652f [witgo] review commit
      b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
      7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
      0da4bc3 [witgo] merge master
      d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      e345919 [witgo] add avro dependency to yarn-alpha
      77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
      1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      934f24d [witgo] review commit
      cf46edc [witgo] exclude jruby
      06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
      99464d2 [witgo] fix maven hadoop-provided profile fails to build
      0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
      6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
      030f2c21
  35. Apr 24, 2014
    • Mridul Muralidharan's avatar
      SPARK-1586 Windows build fixes · 968c0187
      Mridul Muralidharan authored
      Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.
      
      Author: Mridul Muralidharan <mridulm80@apache.org>
      
      This patch had conflicts when merged, resolved by
      Committer: Matei Zaharia <matei@databricks.com>
      
      Closes #505 from mridulm/windows_fixes and squashes the following commits:
      
      ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
      cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
      3267f4b [Mridul Muralidharan] Fix build failures
      35b277a [Mridul Muralidharan] Fix Scalastyle failures
      bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
      10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
      1337abd [Mridul Muralidharan] fix classpath while running in windows
      968c0187
Loading