Skip to content
Snippets Groups Projects
  1. Dec 04, 2015
    • Shixiong Zhu's avatar
      [SPARK-12084][CORE] Fix codes that uses ByteBuffer.array incorrectly · 3af53e61
      Shixiong Zhu authored
      `ByteBuffer` doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by `ByteBuffer.slice`. We should not use the whole content of `ByteBuffer` unless we know that's correct.
      
      This patch fixed all places that use `ByteBuffer.array` incorrectly.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10083 from zsxwing/bytebuffer-array.
      3af53e61
    • Burak Yavuz's avatar
      [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests · 302d68de
      Burak Yavuz authored
      Python tests require access to the `KinesisTestUtils` file. When this file exists under src/test, python can't access it, since it is not available in the assembly jar.
      
      However, if we move KinesisTestUtils to src/main, we need to add the KinesisProducerLibrary as a dependency. In order to avoid this, I moved KinesisTestUtils to src/main, and extended it with ExtendedKinesisTestUtils which is under src/test that adds support for the KPL.
      
      cc zsxwing tdas
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #10050 from brkyvz/kinesis-py.
      302d68de
  2. Dec 01, 2015
  3. Nov 18, 2015
    • Bryan Cutler's avatar
      [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should... · 31921e0f
      Bryan Cutler authored
      [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...>
      
      Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null.  This PR deprecates the old method and uses VoidFunction to allow for more concise declaration.  Also added VoidFunction2 to Java API in order to use in Streaming methods.  Unit test is added for using foreachRDD with VoidFunction, and changes have been tested with Java 7 and Java 8 using lambdas.
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9488 from BryanCutler/foreachRDD-VoidFunction-SPARK-4557.
      31921e0f
  4. Nov 12, 2015
  5. Nov 11, 2015
  6. Nov 09, 2015
  7. Oct 25, 2015
    • Burak Yavuz's avatar
      [SPARK-10891][STREAMING][KINESIS] Add MessageHandler to... · 63accc79
      Burak Yavuz authored
      [SPARK-10891][STREAMING][KINESIS] Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
      
      This PR allows users to map a Kinesis `Record` to a generic `T` when creating a Kinesis stream. This is particularly useful, if you would like to do extra work with Kinesis metadata such as sequence number, and partition key.
      
      TODO:
       - [x] add tests
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #8954 from brkyvz/kinesis-handler.
      63accc79
  8. Oct 07, 2015
  9. Sep 15, 2015
  10. Sep 12, 2015
  11. Aug 25, 2015
  12. Aug 24, 2015
    • Tathagata Das's avatar
      [SPARK-9791] [PACKAGE] Change private class to private class to prevent... · 7478c8b6
      Tathagata Das authored
      [SPARK-9791] [PACKAGE] Change private class to private class to prevent unnecessary classes from showing up in the docs
      
      In addition, some random cleanup of import ordering
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8387 from tdas/SPARK-9791 and squashes the following commits:
      
      67f3ee9 [Tathagata Das] Change private class to private[package] class to prevent them from showing up in the docs
      7478c8b6
    • zsxwing's avatar
      [SPARK-10168] [STREAMING] Fix the issue that maven publishes wrong artifact jars · 4e0395dd
      zsxwing authored
      This PR removed the `outputFile` configuration from pom.xml and updated `tests.py` to search jars for both sbt build and maven build.
      
      I ran ` mvn -Pkinesis-asl -DskipTests clean install` locally, and verified the jars in my local repository were correct. I also checked Python tests for maven build, and it passed all tests.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8373 from zsxwing/SPARK-10168 and squashes the following commits:
      
      e0b5818 [zsxwing] Fix the sbt build
      c697627 [zsxwing] Add the jar pathes to the exception message
      be1d8a5 [zsxwing] Fix the issue that maven publishes wrong artifact jars
      4e0395dd
  13. Aug 18, 2015
  14. Aug 11, 2015
  15. Aug 06, 2015
    • Tathagata Das's avatar
      [SPARK-9556] [SPARK-9619] [SPARK-9624] [STREAMING] Make BlockGenerator more... · 0a078303
      Tathagata Das authored
      [SPARK-9556] [SPARK-9619] [SPARK-9624] [STREAMING] Make BlockGenerator more robust and make all BlockGenerators subscribe to rate limit updates
      
      In some receivers, instead of using the default `BlockGenerator` in `ReceiverSupervisorImpl`, custom generator with their custom listeners are used for reliability (see [`ReliableKafkaReceiver`](https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala#L99) and [updated `KinesisReceiver`](https://github.com/apache/spark/pull/7825/files)). These custom generators do not receive rate updates. This PR modifies the code to allow custom `BlockGenerator`s to be created through the `ReceiverSupervisorImpl` so that they can be kept track and rate updates can be applied.
      
      In the process, I did some simplification, and de-flaki-fication of some rate controller related tests. In particular.
      - Renamed `Receiver.executor` to `Receiver.supervisor` (to match `ReceiverSupervisor`)
      - Made `RateControllerSuite` faster (by increasing batch interval) and less flaky
      - Changed a few internal API to return the current rate of block generators as Long instead of Option\[Long\] (was inconsistent at places).
      - Updated existing `ReceiverTrackerSuite` to test that custom block generators get rate updates as well.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7913 from tdas/SPARK-9556 and squashes the following commits:
      
      41d4461 [Tathagata Das] fix scala style
      eb9fd59 [Tathagata Das] Updated kinesis receiver
      d24994d [Tathagata Das] Updated BlockGeneratorSuite to use manual clock in BlockGenerator
      d70608b [Tathagata Das] Updated BlockGenerator with states and proper synchronization
      f6bd47e [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9556
      31da173 [Tathagata Das] Fix bug
      12116df [Tathagata Das] Add BlockGeneratorSuite
      74bd069 [Tathagata Das] Fix style
      989bb5c [Tathagata Das] Made BlockGenerator fail is used after stop, and added better unit tests for it
      3ff618c [Tathagata Das] Fix test
      b40eff8 [Tathagata Das] slight refactoring
      f0df0f1 [Tathagata Das] Scala style fixes
      51759cb [Tathagata Das] Refactored rate controller tests and added the ability to update rate of any custom block generator
      0a078303
  16. Aug 05, 2015
    • Tathagata Das's avatar
      [SPARK-9217] [STREAMING] Make the kinesis receiver reliable by recording sequence numbers · c2a71f07
      Tathagata Das authored
      This PR is the second one in the larger issue of making the Kinesis integration reliable and provide WAL-free at-least once guarantee. It is based on the design doc - https://docs.google.com/document/d/1k0dl270EnK7uExrsCE7jYw7PYx0YC935uBcxn3p0f58/edit
      
      In this PR, I have updated the Kinesis Receiver to do the following.
      - Control the block generation, by creating its own BlockGenerator with own callback methods and using it to keep track of the ranges of sequence numbers that go into each block.
      - More specifically, as the KinesisRecordProcessor provides small batches of records, the records are atomically inserted into the block (that is, either the whole batch is in the block, or not). Accordingly the sequence number range of the batch is recorded. Since there may be many batches added to a block, the receiver tracks all the range of sequence numbers that is added to a block.
      - When the block is ready to be pushed, the block is pushed and the ranges are reported as metadata of the block. In addition, the ranges are used to find out the latest sequence number for each shard that can be checkpointed through the DynamoDB.
      - Periodically, each KinesisRecordProcessor checkpoints the latest successfully stored sequence number for it own shard.
      - The array of ranges in the block metadata is used to create KinesisBackedBlockRDDs. The ReceiverInputDStream has been slightly refactored to allow the creation of KinesisBackedBlockRDDs instead of the WALBackedBlockRDDs.
      
      Things to be done
      - [x] Add new test to verify that the sequence numbers are recovered.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7825 from tdas/kinesis-receiver and squashes the following commits:
      
      2159be9 [Tathagata Das] Fixed bug
      569be83 [Tathagata Das] Fix scala style issue
      bf31e22 [Tathagata Das] Added more documentation to make the kinesis test endpoint more configurable
      3ad8361 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into kinesis-receiver
      c693a63 [Tathagata Das] Removed unnecessary constructor params from KinesisTestUtils
      e1f1d0a [Tathagata Das] Addressed PR comments
      b9fa6bf [Tathagata Das] Fix serialization issues
      f8b7680 [Tathagata Das] Updated doc
      33fe43a [Tathagata Das] Added more tests
      7997138 [Tathagata Das] Fix style errors
      a806710 [Tathagata Das] Fixed unit test and use KinesisInputDStream
      40a1709 [Tathagata Das] Fixed KinesisReceiverSuite tests
      7e44df6 [Tathagata Das] Added documentation and fixed checkpointing
      096383f [Tathagata Das] Added test, and addressed some of the comments.
      84a7892 [Tathagata Das] fixed scala style issue
      e19e37d [Tathagata Das] Added license
      1cd7b66 [Tathagata Das] Updated kinesis receiver
      c2a71f07
  17. Jul 31, 2015
    • zsxwing's avatar
      [SPARK-8564] [STREAMING] Add the Python API for Kinesis · 3afc1de8
      zsxwing authored
      This PR adds the Python API for Kinesis, including a Python example and a simple unit test.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6955 from zsxwing/kinesis-python and squashes the following commits:
      
      e42e471 [zsxwing] Merge branch 'master' into kinesis-python
      455f7ea [zsxwing] Remove streaming_kinesis_asl_assembly module and simply add the source folder to streaming_kinesis_asl module
      32e6451 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python
      5082d28 [zsxwing] Fix the syntax error for Python 2.6
      fca416b [zsxwing] Fix wrong comparison
      96670ff [zsxwing] Fix the compilation error after merging master
      756a128 [zsxwing] Merge branch 'master' into kinesis-python
      6c37395 [zsxwing] Print stack trace for debug
      7c5cfb0 [zsxwing] RUN_KINESIS_TESTS -> ENABLE_KINESIS_TESTS
      cc9d071 [zsxwing] Fix the python test errors
      466b425 [zsxwing] Add python tests for Kinesis
      e33d505 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python
      3da2601 [zsxwing] Fix the kinesis folder
      687446b [zsxwing] Fix the error message and the maven output path
      add2beb [zsxwing] Merge branch 'master' into kinesis-python
      4957c0b [zsxwing] Add the Python API for Kinesis
      3afc1de8
  18. Jul 30, 2015
    • Tathagata Das's avatar
      [STREAMING] [TEST] [HOTFIX] Fixed Kinesis test to not throw weird errors when... · 1afdeb7b
      Tathagata Das authored
      [STREAMING] [TEST] [HOTFIX] Fixed Kinesis test to not throw weird errors when Kinesis tests are enabled without AWS keys
      
      If Kinesis tests are enabled by env ENABLE_KINESIS_TESTS = 1 but no AWS credentials are found, the desired behavior is the fail the test using with
      ```
      Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kinesis.KinesisBackedBlockRDDSuite *** ABORTED *** (3 seconds, 5 milliseconds)
      [info]   java.lang.Exception: Kinesis tests enabled, but could get not AWS credentials
      ```
      
      Instead KinesisStreamSuite fails with
      
      ```
      [info] - basic operation *** FAILED *** (3 seconds, 35 milliseconds)
      [info]   java.lang.IllegalArgumentException: requirement failed: Stream not yet created, call createStream() to create one
      [info]   at scala.Predef$.require(Predef.scala:233)
      [info]   at org.apache.spark.streaming.kinesis.KinesisTestUtils.streamName(KinesisTestUtils.scala:77)
      [info]   at org.apache.spark.streaming.kinesis.KinesisTestUtils$$anonfun$deleteStream$1.apply(KinesisTestUtils.scala:150)
      [info]   at org.apache.spark.streaming.kinesis.KinesisTestUtils$$anonfun$deleteStream$1.apply(KinesisTestUtils.scala:150)
      [info]   at org.apache.spark.Logging$class.logWarning(Logging.scala:71)
      [info]   at org.apache.spark.streaming.kinesis.KinesisTestUtils.logWarning(KinesisTestUtils.scala:39)
      [info]   at org.apache.spark.streaming.kinesis.KinesisTestUtils.deleteStream(KinesisTestUtils.scala:150)
      [info]   at org.apache.spark.streaming.kinesis.KinesisStreamSuite$$anonfun$3.apply$mcV$sp(KinesisStreamSuite.scala:111)
      [info]   at org.apache.spark.streaming.kinesis.KinesisStreamSuite$$anonfun$3.apply(KinesisStreamSuite.scala:86)
      [info]   at org.apache.spark.streaming.kinesis.KinesisStreamSuite$$anonfun$3.apply(KinesisStreamSuite.scala:86)
      ```
      This is because attempting to delete a non-existent Kinesis stream throws uncaught exception. This PR fixes it.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7809 from tdas/kinesis-test-hotfix and squashes the following commits:
      
      7c372e6 [Tathagata Das] Fixed test
      1afdeb7b
  19. Jul 28, 2015
  20. Jul 25, 2015
  21. Jul 23, 2015
    • Tathagata Das's avatar
      [SPARK-9216] [STREAMING] Define KinesisBackedBlockRDDs · d249636e
      Tathagata Das authored
      For more information see master JIRA: https://issues.apache.org/jira/browse/SPARK-9215
      Design Doc: https://docs.google.com/document/d/1k0dl270EnK7uExrsCE7jYw7PYx0YC935uBcxn3p0f58/edit
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7578 from tdas/kinesis-rdd and squashes the following commits:
      
      543d208 [Tathagata Das] Fixed scala style
      5082a30 [Tathagata Das] Fixed scala style
      3f40c2d [Tathagata Das] Addressed comments
      c4f25d2 [Tathagata Das] Addressed comment
      d3d64d1 [Tathagata Das] Minor update
      f6e35c8 [Tathagata Das] Added retry logic to make it more robust
      8874b70 [Tathagata Das] Updated Kinesis RDD
      575bdbc [Tathagata Das] Fix scala style issues
      4a36096 [Tathagata Das] Add license
      5da3995 [Tathagata Das] Changed KinesisSuiteHelper to KinesisFunSuite
      528e206 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into kinesis-rdd
      3ae0814 [Tathagata Das] Added KinesisBackedBlockRDD
      d249636e
  22. Jul 19, 2015
    • Tathagata Das's avatar
      [SPARK-9030] [STREAMING] [HOTFIX] Make sure that no attempts to create Kinesis... · 93eb2acf
      Tathagata Das authored
      [SPARK-9030] [STREAMING] [HOTFIX] Make sure that no attempts to create Kinesis streams is made when no enabled
      
      Problem: Even when the environment variable to enable tests are disabled, the `beforeAll` of the KinesisStreamSuite attempted to find AWS credentials to create Kinesis stream, and failed.
      
      Solution: Made sure all accesses to KinesisTestUtils, that created streams, is under `testOrIgnore`
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7519 from tdas/kinesis-tests and squashes the following commits:
      
      64d6d06 [Tathagata Das] Removed empty lines.
      7c18473 [Tathagata Das] Putting all access to KinesisTestUtils inside testOrIgnore
      93eb2acf
  23. Jul 17, 2015
    • Tathagata Das's avatar
      [SPARK-9030] [STREAMING] Add Kinesis.createStream unit tests that actual sends data · b13ef772
      Tathagata Das authored
      Current Kinesis unit tests do not test createStream by sending data. This PR is to add such unit test. Note that this unit will not run by default. It will only run when the relevant environment variables are set.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7413 from tdas/kinesis-tests and squashes the following commits:
      
      0e16db5 [Tathagata Das] Added more comments regarding testOrIgnore
      1ea5ce0 [Tathagata Das] Added more comments
      c7caef7 [Tathagata Das] Address comments
      a297b59 [Tathagata Das] Reverted unnecessary change in KafkaStreamSuite
      90c9bde [Tathagata Das] Removed scalatest.FunSuite
      deb7f4f [Tathagata Das] Removed scalatest.FunSuite
      18c2208 [Tathagata Das] Changed how SparkFunSuite is inherited
      dbb33a5 [Tathagata Das] Added license
      88f6dab [Tathagata Das] Added scala docs
      c6be0d7 [Tathagata Das] minor changes
      24a992b [Tathagata Das] Moved KinesisTestUtils to src instead of test for future python usage
      465b55d [Tathagata Das] Made unit tests optional in a nice way
      4d70703 [Tathagata Das] Added license
      129d436 [Tathagata Das] Minor updates
      cc36510 [Tathagata Das] Added KinesisStreamSuite
      b13ef772
  24. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  25. Jul 06, 2015
  26. Jul 02, 2015
    • Andrew Or's avatar
      [SPARK-8781] Fix variables in published pom.xml are not resolved · 82cf3315
      Andrew Or authored
      The issue is summarized in the JIRA and is caused by this commit: 984ad601.
      
      This patch reverts that commit and fixes the maven build in a different way. We limit the dependencies of `KinesisReceiverSuite` to avoid having to deal with the complexities in how maven deals with transitive test dependencies.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7193 from andrewor14/fix-kinesis-pom and squashes the following commits:
      
      ca3d5d4 [Andrew Or] Limit kinesis test dependencies
      f24e09c [Andrew Or] Revert "[BUILD] Fix Maven build for Kinesis"
      82cf3315
  27. Jul 01, 2015
  28. Jun 28, 2015
    • Josh Rosen's avatar
      [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451
      Josh Rosen authored
      Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
      
      See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
      
      70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
      f5100451
  29. Jun 03, 2015
    • Andrew Or's avatar
      [BUILD] Fix Maven build for Kinesis · 984ad601
      Andrew Or authored
      A necessary dependency that is transitively referenced is not
      provided, causing compilation failures in builds that provide
      the kinesis-asl profile.
      984ad601
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  30. May 31, 2015
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for examples/streaming/yarn. · 564bc11e
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:
      
      7b7b3a0 [Reynold Xin] Reset again.
      dc14597 [Reynold Xin] Reset scalastyle.
      cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
      4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.
      564bc11e
  31. May 29, 2015
  32. May 23, 2015
  33. May 22, 2015
  34. May 21, 2015
Loading