Skip to content
Snippets Groups Projects
  1. Jul 22, 2015
    • Josh Rosen's avatar
      [SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and spark.localExecution.enabled · b217230f
      Josh Rosen authored
      Spark has an option called spark.localExecution.enabled; according to the docs:
      
      > Enables Spark to run certain jobs, such as first() or take() on the driver, without sending tasks to the cluster. This can make certain jobs execute very quickly, but may require shipping a whole partition of data to the driver.
      
      This feature ends up adding quite a bit of complexity to DAGScheduler, especially in the runLocallyWithinThread method, but as far as I know nobody uses this feature (I searched the mailing list and haven't seen any recent mentions of the configuration nor stacktraces including the runLocally method). As a step towards scheduler complexity reduction, I propose that we remove this feature and all code related to it for Spark 1.5.
      
      This pull request simply brings #7484 up to date.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7585 from rxin/remove-local-exec and squashes the following commits:
      
      84bd10e [Reynold Xin] Python fix.
      1d9739a [Reynold Xin] Merge pull request #7484 from JoshRosen/remove-localexecution
      eec39fa [Josh Rosen] Remove allowLocal(); deprecate user-facing uses of it.
      b0835dc [Josh Rosen] Remove local execution code in DAGScheduler
      8975d96 [Josh Rosen] Remove local execution tests.
      ffa8c9b [Josh Rosen] Remove documentation for configuration
      b217230f
  2. Jul 14, 2015
    • Josh Rosen's avatar
      [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses · 11e5c372
      Josh Rosen authored
      This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly.  `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits:
      
      e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      c0b7885 [Josh Rosen] Hopefully fix the last two cases
      d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass
      046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion.
      d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName
      11e5c372
  3. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  4. Jul 09, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8852] [FLUME] Trim dependencies in flume assembly. · 0e78e40c
      Marcelo Vanzin authored
      Also, add support for the *-provided profiles. This avoids repackaging
      things that are already in the Spark assembly, or, in the case of the
      *-provided profiles, are provided by the distribution.
      
      The flume-ng-auth dependency was also excluded since it's not really
      used by Spark.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7247 from vanzin/SPARK-8852 and squashes the following commits:
      
      298a7d5 [Marcelo Vanzin] Feedback.
      c962082 [Marcelo Vanzin] [SPARK-8852] [flume] Trim dependencies in flume assembly.
      0e78e40c
    • guowei2's avatar
      [SPARK-8865] [STREAMING] FIX BUG: check key in kafka params · 89770036
      guowei2 authored
      Author: guowei2 <guowei@growingio.com>
      
      Closes #7254 from guowei2/spark-8865 and squashes the following commits:
      
      48ca17a [guowei2] fix contains key
      89770036
    • jerryshao's avatar
      [SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python · 3ccebf36
      jerryshao authored
      This PR propose a simple way to expose OffsetRange in Python code, also the usage of offsetRanges is similar to Scala/Java way, here in Python we could get OffsetRange like:
      
      ```
      dstream.foreachRDD(lambda r: KafkaUtils.offsetRanges(r))
      ```
      
      Reason I didn't follow the way what SPARK-8389 suggested is that: Python Kafka API has one more step to decode the message compared to Scala/Java, Which makes Python API return a transformed RDD/DStream, not directly wrapped so-called JavaKafkaRDD, so it is hard to backtrack to the original RDD to get the offsetRange.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #7185 from jerryshao/SPARK-8389 and squashes the following commits:
      
      4c6d320 [jerryshao] Another way to fix subclass deserialization issue
      e6a8011 [jerryshao] Address the comments
      fd13937 [jerryshao] Fix serialization bug
      7debf1c [jerryshao] bug fix
      cff3893 [jerryshao] refactor the code according to the comments
      2aabf9e [jerryshao] Style fix
      848c708 [jerryshao] Add HasOffsetRanges for Python
      3ccebf36
    • zsxwing's avatar
      [SPARK-8701] [STREAMING] [WEBUI] Add input metadata in the batch page · 1f6b0b12
      zsxwing authored
      This PR adds `metadata` to `InputInfo`. `InputDStream` can report its metadata for a batch and it will be shown in the batch page.
      
      For example,
      
      ![screen shot](https://cloud.githubusercontent.com/assets/1000778/8403741/d6ffc7e2-1e79-11e5-9888-c78c1575123a.png)
      
      FileInputDStream will display the new files for a batch, and DirectKafkaInputDStream will display its offset ranges.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7081 from zsxwing/input-metadata and squashes the following commits:
      
      f7abd9b [zsxwing] Revert the space changes in project/MimaExcludes.scala
      d906209 [zsxwing] Merge branch 'master' into input-metadata
      74762da [zsxwing] Fix MiMa tests
      7903e33 [zsxwing] Merge branch 'master' into input-metadata
      450a46c [zsxwing] Address comments
      1d94582 [zsxwing] Raname InputInfo to StreamInputInfo and change "metadata" to Map[String, Any]
      d496ae9 [zsxwing] Add input metadata in the batch page
      1f6b0b12
  5. Jul 08, 2015
    • jerryshao's avatar
      [SPARK-7050] [BUILD] Fix Python Kafka test assembly jar not found issue under Maven build · 8a9d9cc1
      jerryshao authored
       To fix Spark Streaming unit test with maven build. Previously the name and path of maven generated jar is different from sbt, which will lead to following exception. This fix keep the same behavior with both Maven and sbt build.
      
      ```
      Failed to find Spark Streaming Kafka assembly jar in /home/xyz/spark/external/kafka-assembly
      You need to build Spark with  'build/sbt assembly/assembly streaming-kafka-assembly/assembly' or 'build/mvn package' before running this program
      ```
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #5632 from jerryshao/SPARK-7050 and squashes the following commits:
      
      74b068d [jerryshao] Fix mvn build issue
      8a9d9cc1
  6. Jul 01, 2015
    • zsxwing's avatar
      [SPARK-8378] [STREAMING] Add the Python API for Flume · 75b9fe4c
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6830 from zsxwing/flume-python and squashes the following commits:
      
      78dfdac [zsxwing] Fix the compile error in the test code
      f1bf3c0 [zsxwing] Address TD's comments
      0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
      e93736b [zsxwing] Fix the test case for determine_modules_to_test
      9d5821e [zsxwing] Fix pyspark_core dependencies
      f9ee681 [zsxwing] Merge branch 'master' into flume-python
      7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
      b96b0de [zsxwing] Merge branch 'master' into flume-python
      ce85e83 [zsxwing] Fix incompatible issues for Python 3
      01cbb3d [zsxwing] Add import sys
      152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
      14ba0ff [zsxwing] Add flume-assembly for sbt building
      b8d5551 [zsxwing] Merge branch 'master' into flume-python
      4762c34 [zsxwing] Fix the doc
      0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
      9f33873 [zsxwing] Add the Python API for Flume
      75b9fe4c
  7. Jun 23, 2015
    • Hari Shreedharan's avatar
      [SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Si… · 9b618fb0
      Hari Shreedharan authored
      …nk. Also bump Flume version to 1.6.0
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6910 from harishreedharan/remove-commons-lang3 and squashes the following commits:
      
      9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0
      ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
      9b618fb0
  8. Jun 19, 2015
    • cody koeninger's avatar
      [SPARK-8127] [STREAMING] [KAFKA] KafkaRDD optimize count() take() isEmpty() · 1b6fe9b1
      cody koeninger authored
      …ed KafkaRDD methods.  Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless.
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6632 from koeninger/kafka-rdd-count and squashes the following commits:
      
      321340d [cody koeninger] [SPARK-8127][Streaming][Kafka] additional test of ordering of take()
      5a05d0f [cody koeninger] [SPARK-8127][Streaming][Kafka] additional test of isEmpty
      f68bd32 [cody koeninger] [Streaming][Kafka][SPARK-8127] code cleanup
      9555b73 [cody koeninger] Merge branch 'master' into kafka-rdd-count
      253031d [cody koeninger] [Streaming][Kafka][SPARK-8127] mima exclusion for change to private method
      8974b9e [cody koeninger] [Streaming][Kafka][SPARK-8127] check offset ranges before constructing KafkaRDD
      c3768c5 [cody koeninger] [Streaming][Kafka] Take advantage of offset range info for size-related KafkaRDD methods.  Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless.
      1b6fe9b1
    • cody koeninger's avatar
      [SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRanges · b305e377
      cody koeninger authored
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6863 from koeninger/SPARK-8390 and squashes the following commits:
      
      26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390
      3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs
      b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing
      bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup
      3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
      b305e377
    • cody koeninger's avatar
      [SPARK-8389] [STREAMING] [KAFKA] Example of getting offset ranges out o… · 47af7c1e
      cody koeninger authored
      …f the existing java direct stream api
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6846 from koeninger/SPARK-8389 and squashes the following commits:
      
      3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
      47af7c1e
  9. Jun 17, 2015
    • zsxwing's avatar
      [SPARK-8404] [STREAMING] [TESTS] Use thread-safe collections to make the tests more reliable · a06d9c8e
      zsxwing authored
      KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests.
      
      This PR changes them to thread-safe collections.
      
      Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6852 from zsxwing/fix-KafkaStreamSuite and squashes the following commits:
      
      d464211 [zsxwing] Use thread-safe collections to make the tests more reliable
      a06d9c8e
  10. Jun 07, 2015
    • cody koeninger's avatar
      [SPARK-2808] [STREAMING] [KAFKA] cleanup tests from · b127ff8a
      cody koeninger authored
      see if requiring producer acks eliminates the need for waitUntilLeaderOffset calls in tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #5921 from koeninger/kafka-0.8.2-test-cleanup and squashes the following commits:
      
      1e89dc8 [cody koeninger] Merge branch 'master' into kafka-0.8.2-test-cleanup
      4662828 [cody koeninger] [Streaming][Kafka] filter mima issue for removal of method from private test class
      af1e083 [cody koeninger] Merge branch 'master' into kafka-0.8.2-test-cleanup
      4298ac2 [cody koeninger] [Streaming][Kafka] update comment to trigger jenkins attempt
      1274afb [cody koeninger] [Streaming][Kafka] see if requiring producer acks eliminates the need for waitUntilLeaderOffset calls in tests
      b127ff8a
  11. Jun 03, 2015
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  12. Jun 02, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8015] [FLUME] Remove Guava dependency from flume-sink. · 0071bd8d
      Marcelo Vanzin authored
      The minimal change would be to disable shading of Guava in the module,
      and rely on the transitive dependency from other libraries instead. But
      since Guava's use is so localized, I think it's better to just not use
      it instead, so I replaced that code and removed all traces of Guava from
      the module's build.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6555 from vanzin/SPARK-8015 and squashes the following commits:
      
      c0ceea8 [Marcelo Vanzin] Add comments about dependency management.
      c38228d [Marcelo Vanzin] Add guava dep in test scope.
      b7a0349 [Marcelo Vanzin] Add libthrift exclusion.
      6e0942d [Marcelo Vanzin] Add comment in pom.
      2d79260 [Marcelo Vanzin] [SPARK-8015] [flume] Remove Guava dependency from flume-sink.
      0071bd8d
  13. May 31, 2015
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for examples/streaming/yarn. · 564bc11e
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:
      
      7b7b3a0 [Reynold Xin] Reset again.
      dc14597 [Reynold Xin] Reset scalastyle.
      cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
      4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.
      564bc11e
  14. May 30, 2015
    • Andrew Or's avatar
      [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike · 609c4923
      Andrew Or authored
      This is a follow-up patch to #6441.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6510 from andrewor14/extends-funsuite-check and squashes the following commits:
      
      6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check
      99d02ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into extends-funsuite-check
      48874dd [Andrew Or] Guard against direct uses of FunSuite / FunSuiteLike
      609c4923
  15. May 29, 2015
    • Andrew Or's avatar
      [HOT FIX] [BUILD] Fix maven build failures · a4f24123
      Andrew Or authored
      This patch fixes a build break in maven caused by #6441.
      
      Note that this patch reverts the changes in flume-sink because
      this module does not currently depend on Spark core, but the
      tests require it. There is not an easy way to make this work
      because mvn test dependencies are not transitive (MNG-1378).
      
      For now, we will leave the one test suite in flume-sink out
      until we figure out a better solution. This patch is mainly
      intended to unbreak the maven build.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6511 from andrewor14/fix-build-mvn and squashes the following commits:
      
      3d53643 [Andrew Or] [HOT FIX #6441] Fix maven build failures
      a4f24123
    • Andrew Or's avatar
      [SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1
      Andrew Or authored
      Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
      
      ```
      ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
      
      15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
      
      ...
      
      15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
      15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
      
      ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
      
      ...
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
      
      879b060 [Andrew Or] Fix compile after rebase
      d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      7790b6c [Andrew Or] Fix tests after logical merge conflict
      c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      c43ffc4 [Andrew Or] Fix tests?
      8882581 [Andrew Or] Fix tests
      ee22cda [Andrew Or] Fix log message
      fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      12d1e1b [Andrew Or] Various whitespace changes (minor)
      69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
      bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
      da0b12f [Andrew Or] Add core tests as dependencies in all modules
      f7d29ce [Andrew Or] Introduce base abstract class for all test suites
      9eb222c1
    • Reynold Xin's avatar
      [SPARK-7929] Turn whitespace checker on for more token types. · 97a60cf7
      Reynold Xin authored
      This is the last batch of changes to complete SPARK-7929.
      
      Previous related PRs:
      https://github.com/apache/spark/pull/6480
      https://github.com/apache/spark/pull/6478
      https://github.com/apache/spark/pull/6477
      https://github.com/apache/spark/pull/6476
      https://github.com/apache/spark/pull/6475
      https://github.com/apache/spark/pull/6474
      https://github.com/apache/spark/pull/6473
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6487 from rxin/whitespace-lint and squashes the following commits:
      
      b33d43d [Reynold Xin] [SPARK-7929] Turn whitespace checker on for more token types.
      97a60cf7
  16. May 18, 2015
    • jerluc's avatar
      [SPARK-7621] [STREAMING] Report Kafka errors to StreamingListeners · 0a7a94ea
      jerluc authored
      PR per [SPARK-7621](https://issues.apache.org/jira/browse/SPARK-7621), which makes both `KafkaReceiver` and `ReliableKafkaReceiver` report its errors to the `ReceiverTracker`, which in turn will add the events to the bus to fire off any registered `StreamingListener`s.
      
      Author: jerluc <jeremyalucas@gmail.com>
      
      Closes #6204 from jerluc/master and squashes the following commits:
      
      82439a5 [jerluc] [SPARK-7621] [STREAMING] Report Kafka errors to StreamingListeners
      0a7a94ea
    • Andrew Or's avatar
      [SPARK-7501] [STREAMING] DAG visualization: show DStream operations · b93c97d7
      Andrew Or authored
      This is similar to #5999, but for streaming. Roughly 200 lines are tests.
      
      One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way.
      
      tdas zsxwing
      
      ------------------------
      **Before**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/>
      
      --------------------------
      **After**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6034 from andrewor14/dag-viz-streaming and squashes the following commits:
      
      932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      e685df9 [Andrew Or] Rename createRDDWith
      84d0656 [Andrew Or] Review feedback
      697c086 [Andrew Or] Fix tests
      53b9936 [Andrew Or] Set scopes for foreachRDD properly
      1881802 [Andrew Or] Refactor DStream scope names again
      af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      fd07d22 [Andrew Or] Make MQTT lower case
      f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases
      fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within
      1af0b0e [Andrew Or] Fix style
      074c00b [Andrew Or] Review comments
      d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      e4a93ac [Andrew Or] Fix tests?
      25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      9113183 [Andrew Or] Add tests for DStream scopes
      b3806ab [Andrew Or] Fix test
      bb80bbb [Andrew Or] Fix MIMA?
      5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      5703939 [Andrew Or] Rename operations that create InputDStreams
      7c4513d [Andrew Or] Group RDDs by DStream operations and batches
      bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      05c2676 [Andrew Or] Wrap many more methods in withScope
      c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      65ef3e9 [Andrew Or] Fix NPE
      a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations
      b93c97d7
  17. May 13, 2015
    • Hari Shreedharan's avatar
      [SPARK-7356] [STREAMING] Fix flakey tests in FlumePollingStreamSuite using... · 61d1e87c
      Hari Shreedharan authored
      [SPARK-7356] [STREAMING] Fix flakey tests in FlumePollingStreamSuite using SparkSink's batch CountDownLatch.
      
      This is meant to make the FlumePollingStreamSuite deterministic. Now we basically count the number of batches that have been completed - and then verify the results rather than sleeping for random periods of time.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5918 from harishreedharan/flume-test-fix and squashes the following commits:
      
      93f24f3 [Hari Shreedharan] Add an eventually block to ensure that all received data is processed. Refactor the dstream creation and remove redundant code.
      1108804 [Hari Shreedharan] [SPARK-7356][STREAMING] Fix flakey tests in FlumePollingStreamSuite using SparkSink's batch CountDownLatch.
      61d1e87c
  18. May 05, 2015
  19. May 01, 2015
    • cody koeninger's avatar
      [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2 · 47864840
      cody koeninger authored
      i don't think this should be merged until after 1.3.0 is final
      
      Author: cody koeninger <cody@koeninger.org>
      Author: Helena Edelson <helena.edelson@datastax.com>
      
      Closes #4537 from koeninger/wip-2808-kafka-0.8.2-upgrade and squashes the following commits:
      
      803aa2c [cody koeninger] [SPARK-2808][Streaming][Kafka] code cleanup per TD
      e6dfaf6 [cody koeninger] [SPARK-2808][Streaming][Kafka] pointless whitespace change to trigger jenkins again
      1770abc [cody koeninger] [SPARK-2808][Streaming][Kafka] make waitUntilLeaderOffset easier to call, call it from python tests as well
      d4267e9 [cody koeninger] [SPARK-2808][Streaming][Kafka] fix stderr redirect in python test script
      30d991d [cody koeninger] [SPARK-2808][Streaming][Kafka] remove stderr prints since it breaks python 3 syntax
      1d896e2 [cody koeninger] [SPARK-2808][Streaming][Kafka] add even even more logging to python test
      4c4557f [cody koeninger] [SPARK-2808][Streaming][Kafka] add even more logging to python test
      115aeee [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
      2712649 [cody koeninger] [SPARK-2808][Streaming][Kafka] add more logging to python test, see why its timing out in jenkins
      2b92d3f [cody koeninger] [SPARK-2808][Streaming][Kafka] wait for leader offsets in the java test as well
      3824ce3 [cody koeninger] [SPARK-2808][Streaming][Kafka] naming / comments per tdas
      61b3464 [cody koeninger] [SPARK-2808][Streaming][Kafka] delay for second send in boundary condition test
      af6f3ec [cody koeninger] [SPARK-2808][Streaming][Kafka] delay test until latest leader offset matches expected value
      9edab4c [cody koeninger] [SPARK-2808][Streaming][Kafka] more shots in the dark on jenkins failing test
      c70ee43 [cody koeninger] [SPARK-2808][Streaming][Kafka] add more asserts to test, try to figure out why it fails on jenkins but not locally
      1d10751 [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
      ed02d2c [cody koeninger] [SPARK-2808][Streaming][Kafka] move default argument for api version to overloaded method, for binary compat
      407382e [cody koeninger] [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2.1
      77de6c2 [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
      6953429 [cody koeninger] [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
      2e67c66 [Helena Edelson] #SPARK-2808 Update to Kafka 0.8.2.0 GA from beta.
      d9dc2bc [Helena Edelson] Merge remote-tracking branch 'upstream/master' into wip-2808-kafka-0.8.2-upgrade
      e768164 [Helena Edelson] #2808 update kafka to version 0.8.2
      47864840
  20. Apr 29, 2015
    • Tathagata Das's avatar
      [SPARK-7056] [STREAMING] Make the Write Ahead Log pluggable · 1868bd40
      Tathagata Das authored
      Users may want the WAL data to be written to non-HDFS data storage systems. To allow that, we have to make the WAL pluggable. The following design doc outlines the plan.
      
      https://docs.google.com/a/databricks.com/document/d/1A2XaOLRFzvIZSi18i_luNw5Rmm9j2j4AigktXxIYxmY/edit?usp=sharing
      
      Things to add.
      * Unit tests for WriteAheadLogUtils
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5645 from tdas/wal-pluggable and squashes the following commits:
      
      2c431fd [Tathagata Das] Minor fixes.
      c2bc7384 [Tathagata Das] More changes based on PR comments.
      569a416 [Tathagata Das] fixed long line
      bde26b1 [Tathagata Das] Renamed segment to record handle everywhere
      b65e155 [Tathagata Das] More changes based on PR comments.
      d7cd15b [Tathagata Das] Fixed test
      1a32a4b [Tathagata Das] Fixed test
      e0d19fb [Tathagata Das] Fixed defaults
      9310cbf [Tathagata Das] style fix.
      86abcb1 [Tathagata Das] Refactored WriteAheadLogUtils, and consolidated all WAL related configuration into it.
      84ce469 [Tathagata Das] Added unit test and fixed compilation error.
      bce5e75 [Tathagata Das] Fixed long lines.
      837c4f5 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into wal-pluggable
      754fbf8 [Tathagata Das] Added license and docs.
      09bc6fe [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into wal-pluggable
      7dd2d4b [Tathagata Das] Added pluggable WriteAheadLog interface, and refactored all code along with it
      1868bd40
  21. Apr 28, 2015
    • jerryshao's avatar
      [SPARK-5946] [STREAMING] Add Python API for direct Kafka stream · 9e4e82b7
      jerryshao authored
      Currently only added `createDirectStream` API, I'm not sure if `createRDD` is also needed, since some Java object needs to be wrapped in Python. Please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #4723 from jerryshao/direct-kafka-python-api and squashes the following commits:
      
      a1fe97c [jerryshao] Fix rebase issue
      eebf333 [jerryshao] Address the comments
      da40f4e [jerryshao] Fix Python 2.6 Syntax error issue
      5c0ee85 [jerryshao] Style fix
      4aeac18 [jerryshao] Fix bug in example code
      7146d86 [jerryshao] Add unit test
      bf3bdd6 [jerryshao] Add more APIs and address the comments
      f5b3801 [jerryshao] Small style fix
      8641835 [Saisai Shao] Rebase and update the code
      589c05b [Saisai Shao] Fix the style
      d6fcb6a [Saisai Shao] Address the comments
      dfda902 [Saisai Shao] Style fix
      0f7d168 [Saisai Shao] Add the doc and fix some style issues
      67e6880 [Saisai Shao] Fix test bug
      917b0db [Saisai Shao] Add Python createRDD API for Kakfa direct stream
      c3fc11d [jerryshao] Modify the docs
      2c00936 [Saisai Shao] address the comments
      3360f44 [jerryshao] Fix code style
      e0e0f0d [jerryshao] Code clean and bug fix
      338c41f [Saisai Shao] Add python API and example for direct kafka stream
      9e4e82b7
  22. Apr 27, 2015
    • Sean Owen's avatar
      [SPARK-7145] [CORE] commons-lang (2.x) classes used instead of commons-lang3... · ab5adb7a
      Sean Owen authored
      [SPARK-7145] [CORE] commons-lang (2.x) classes used instead of commons-lang3 (3.x); commons-io used without dependency
      
      Remove use of commons-lang in favor of commons-lang3 classes; remove commons-io use in favor of Guava
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5703 from srowen/SPARK-7145 and squashes the following commits:
      
      21fbe03 [Sean Owen] Remove use of commons-lang in favor of commons-lang3 classes; remove commons-io use in favor of Guava
      ab5adb7a
  23. Apr 22, 2015
  24. Apr 16, 2015
  25. Apr 12, 2015
    • cody koeninger's avatar
      [SPARK-6431][Streaming][Kafka] Error message for partition metadata requ... · 6ac8eea2
      cody koeninger authored
      ...ests
      
      The original reported problem was misdiagnosed; the topic just didn't exist yet.  Agreed upon solution was to improve error handling / message
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #5454 from koeninger/spark-6431-master and squashes the following commits:
      
      44300f8 [cody koeninger] [SPARK-6431][Streaming][Kafka] Error message for partition metadata requests
      6ac8eea2
  26. Apr 10, 2015
    • jerryshao's avatar
      [SPARK-6211][Streaming] Add Python Kafka API unit test · 3290d2d1
      jerryshao authored
      Refactor the Kafka unit test and add Python API support. CC tdas davies please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #4961 from jerryshao/SPARK-6211 and squashes the following commits:
      
      ee4b919 [jerryshao] Fixed newly merged issue
      82c756e [jerryshao] Address the comments
      92912d1 [jerryshao] Address the commits
      0708bb1 [jerryshao] Fix rebase issue
      40b47a3 [Saisai Shao] Style fix
      f889657 [Saisai Shao] Update the code according
      8a2f3e2 [jerryshao] Address the issues
      0f1b7ce [jerryshao] Still fix the bug
      61a04f0 [jerryshao] Fix bugs and address the issues
      64d9877 [jerryshao] Fix rebase bugs
      8ad442f [jerryshao] Add kafka-assembly in run-tests
      6020b00 [jerryshao] Add more debug info in Shell
      8102d6e [jerryshao] Fix bug in Jenkins test
      fde1213 [jerryshao] Code style changes
      5536f95 [jerryshao] Refactor the Kafka unit test and add Python Kafka unittest support
      3290d2d1
  27. Apr 09, 2015
  28. Apr 08, 2015
    • Reynold Xin's avatar
      [SPARK-6765] Fix test code style for streaming. · 15e0d2bd
      Reynold Xin authored
      So we can turn style checker on for test code.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5409 from rxin/test-style-streaming and squashes the following commits:
      
      7aea69b [Reynold Xin] [SPARK-6765] Fix test code style for streaming.
      15e0d2bd
  29. Apr 06, 2015
  30. Apr 03, 2015
    • Reynold Xin's avatar
      [SPARK-6428] Turn on explicit type checking for public methods. · 82701ee2
      Reynold Xin authored
      This builds on my earlier pull requests and turns on the explicit type checking in scalastyle.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5342 from rxin/SPARK-6428 and squashes the following commits:
      
      7b531ab [Reynold Xin] import ordering
      2d9a8a5 [Reynold Xin] jl
      e668b1c [Reynold Xin] override
      9b9e119 [Reynold Xin] Parenthesis.
      82e0cf5 [Reynold Xin] [SPARK-6428] Turn on explicit type checking for public methods.
      82701ee2
  31. Mar 24, 2015
    • Kousuke Saruta's avatar
      [SPARK-5559] [Streaming] [Test] Remove oppotunity we met flakiness when running FlumeStreamSuite · 85cf0636
      Kousuke Saruta authored
      When we run FlumeStreamSuite on Jenkins, sometimes we get error like as follows.
      
          sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 52 times over 10.094849836 seconds. Last failure message: Error connecting to localhost/127.0.0.1:23456.
      	    at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
      	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
      	    at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
      	   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      	   at org.apache.spark.streaming.flume.FlumeStreamSuite.writeAndVerify(FlumeStreamSuite.scala:116)
                 at org.apache.spark.streaming.flume.FlumeStreamSuite.org$apache$spark$streaming$flume$FlumeStreamSuite$$testFlumeStream(FlumeStreamSuite.scala:74)
      	   at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply$mcV$sp(FlumeStreamSuite.scala:66)
      	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
      	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
      	    at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
      	    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
      	    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      	    at org.scalatest.Transformer.apply(Transformer.scala:22)
      	    at org.scalatest.Transformer.apply(Transformer.scala:20)
          	    at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
      	    at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
      	    at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
      	    at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
      	   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      	    at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      	    at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
      	    at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
      
      This error is caused by check-then-act logic  when it find free-port .
      
            /** Find a free port */
            private def findFreePort(): Int = {
              Utils.startServiceOnPort(23456, (trialPort: Int) => {
                val socket = new ServerSocket(trialPort)
                socket.close()
                (null, trialPort)
              }, conf)._2
            }
      
      Removing the check-then-act is not easy but we can reduce the chance of having the error by choosing random value for initial port instead of 23456.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4337 from sarutak/SPARK-5559 and squashes the following commits:
      
      16f109f [Kousuke Saruta] Added `require` to Utils#startServiceOnPort
      c39d8b6 [Kousuke Saruta] Merge branch 'SPARK-5559' of github.com:sarutak/spark into SPARK-5559
      1610ba2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
      33357e3 [Kousuke Saruta] Changed "findFreePort" method in MQTTStreamSuite and FlumeStreamSuite so that it can choose valid random port
      a9029fe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
      9489ef9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
      8212e42 [Kousuke Saruta] Modified default port used in FlumeStreamSuite from 23456 to random value
      85cf0636
Loading