Skip to content
Snippets Groups Projects
  1. Jun 05, 2015
  2. Jun 03, 2015
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  3. Jun 01, 2015
    • zsxwing's avatar
      [SPARK-8025][Streaming]Add JavaDoc style deprecation for deprecated Streaming methods · 7f74bb3b
      zsxwing authored
      Scala `deprecated` annotation actually doesn't show up in JavaDoc.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6564 from zsxwing/SPARK-8025 and squashes the following commits:
      
      2faa2bb [zsxwing] Add JavaDoc style deprecation for deprecated Streaming methods
      7f74bb3b
    • Tathagata Das's avatar
      [SPARK-7958] [STREAMING] Handled exception in StreamingContext.start() to prevent leaking of actors · 2f9c7519
      Tathagata Das authored
      StreamingContext.start() can throw exception because DStream.validateAtStart() fails (say, checkpoint directory not set for StateDStream). But by then JobScheduler, JobGenerator, and ReceiverTracker has already started, along with their actors. But those cannot be shutdown because the only way to do that is call StreamingContext.stop() which cannot be called as the context has not been marked as ACTIVE.
      
      The solution in this PR is to stop the internal scheduler if start throw exception, and mark the context as STOPPED.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6559 from tdas/SPARK-7958 and squashes the following commits:
      
      20b2ec1 [Tathagata Das] Added synchronized
      790b617 [Tathagata Das] Handled exception in StreamingContext.start()
      2f9c7519
  4. May 31, 2015
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for examples/streaming/yarn. · 564bc11e
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:
      
      7b7b3a0 [Reynold Xin] Reset again.
      dc14597 [Reynold Xin] Reset scalastyle.
      cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
      4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.
      564bc11e
  5. May 29, 2015
    • Andrew Or's avatar
      [SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1
      Andrew Or authored
      Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
      
      ```
      ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
      
      15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
      
      ...
      
      15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
      15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
      
      ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
      
      ...
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
      
      879b060 [Andrew Or] Fix compile after rebase
      d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      7790b6c [Andrew Or] Fix tests after logical merge conflict
      c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      c43ffc4 [Andrew Or] Fix tests?
      8882581 [Andrew Or] Fix tests
      ee22cda [Andrew Or] Fix log message
      fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      12d1e1b [Andrew Or] Various whitespace changes (minor)
      69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
      bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
      da0b12f [Andrew Or] Add core tests as dependencies in all modules
      f7d29ce [Andrew Or] Introduce base abstract class for all test suites
      9eb222c1
    • Patrick Wendell's avatar
      36067ce3
    • Tathagata Das's avatar
      [SPARK-7931] [STREAMING] Do not restart receiver when stopped · e714ecf2
      Tathagata Das authored
      Attempts to restart the socket receiver when it is supposed to be stopped causes undesirable error messages.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6483 from tdas/SPARK-7931 and squashes the following commits:
      
      09aeee1 [Tathagata Das] Do not restart receiver when stopped
      e714ecf2
  6. May 28, 2015
    • Reynold Xin's avatar
      [SPARK-7927] whitespace fixes for streaming. · 3af0b313
      Reynold Xin authored
      So we can enable a whitespace enforcement rule in the style checker to save code review time.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6475 from rxin/whitespace-streaming and squashes the following commits:
      
      810dae4 [Reynold Xin] Fixed tests.
      89068ad [Reynold Xin] [SPARK-7927] whitespace fixes for streaming.
      3af0b313
  7. May 23, 2015
    • zsxwing's avatar
      [SPARK-7777][Streaming] Handle the case when there is no block in a batch · ad0badba
      zsxwing authored
      In the old implementation, if a batch has no block, `areWALRecordHandlesPresent` will be `true` and it will return `WriteAheadLogBackedBlockRDD`.
      
      This PR handles this case by returning `WriteAheadLogBackedBlockRDD` or `BlockRDD` according to the configuration.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6372 from zsxwing/SPARK-7777 and squashes the following commits:
      
      788f895 [zsxwing] Handle the case when there is no block in a batch
      ad0badba
    • Tathagata Das's avatar
      [SPARK-7838] [STREAMING] Set scope for kinesis stream · baa89838
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6369 from tdas/SPARK-7838 and squashes the following commits:
      
      87d1c7f [Tathagata Das] Addressed comment
      37775d8 [Tathagata Das] set scope for kinesis stream
      baa89838
  8. May 21, 2015
    • Tathagata Das's avatar
      [SPARK-7776] [STREAMING] Added shutdown hook to StreamingContext · d68ea24d
      Tathagata Das authored
      Shutdown hook to stop SparkContext was added recently. This results in ugly errors when a streaming application is terminated by ctrl-C.
      
      ```
      Exception in thread "Thread-27" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:736)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:735)
      	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
      	at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:735)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1468)
      	at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
      	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1403)
      	at org.apache.spark.SparkContext.stop(SparkContext.scala:1642)
      	at org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:559)
      	at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2266)
      	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2236)
      	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2236)
      	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2236)
      	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1764)
      	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2236)
      	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2236)
      	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2236)
      	at scala.util.Try$.apply(Try.scala:161)
      	at org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2236)
      	at org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2218)
      	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
      ```
      
      This is because the Spark's shutdown hook stops the context, and the streaming jobs fail in the middle. The correct solution is to stop the streaming context before the spark context. This PR adds the shutdown hook to do so with a priority higher than the SparkContext's shutdown hooks priority.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6307 from tdas/SPARK-7776 and squashes the following commits:
      
      e3d5475 [Tathagata Das] Added conf to specify graceful shutdown
      4c18652 [Tathagata Das] Added shutdown hook to StreamingContxt.
      d68ea24d
    • Burak Yavuz's avatar
      [SPARK-7745] Change asserts to requires for user input checks in Spark Streaming · 1ee8eb43
      Burak Yavuz authored
      Assertions can be turned off. `require` throws an `IllegalArgumentException` which makes more sense when it's a user set variable.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #6271 from brkyvz/streaming-require and squashes the following commits:
      
      d249484 [Burak Yavuz] fix merge conflict
      264adb8 [Burak Yavuz] addressed comments v1.0
      6161350 [Burak Yavuz] fix tests
      16aa766 [Burak Yavuz] changed more assertions to more meaningful errors
      afd923d [Burak Yavuz] changed some assertions to require
      1ee8eb43
  9. May 20, 2015
    • zsxwing's avatar
      [SPARK-7777] [STREAMING] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite · 895baf8f
      zsxwing authored
      Just added a guard to make sure a batch has completed before moving to the next batch.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6306 from zsxwing/SPARK-7777 and squashes the following commits:
      
      ecee529 [zsxwing] Fix the failure message
      58634fe [zsxwing] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite
      895baf8f
    • Tathagata Das's avatar
      [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start() · 3c434cbf
      Tathagata Das authored
      Currently, the background checkpointing thread fails silently if the checkpoint is not serializable. It is hard to debug and therefore its best to fail fast at `start()` when checkpointing is enabled and the checkpoint is not serializable.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6292 from tdas/SPARK-7767 and squashes the following commits:
      
      51304e6 [Tathagata Das] Addressed comments.
      c35237b [Tathagata Das] Added test for checkpoint serialization in StreamingContext.start()
      3c434cbf
    • Andrew Or's avatar
      [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning · 9b84443d
      Andrew Or authored
      SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6269 from andrewor14/clean-moar and squashes the following commits:
      
      c51c9ab [Andrew Or] Add periods (trivial)
      6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
      79a435b [Andrew Or] Fix tests
      d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
      65ef07b [Andrew Or] Fix tests?
      4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
      328139b [Andrew Or] Do not forget foreachRDD
      5431f61 [Andrew Or] Clean streaming closures
      72b7b73 [Andrew Or] Clean core closures
      9b84443d
  10. May 18, 2015
    • Andrew Or's avatar
      [SPARK-7501] [STREAMING] DAG visualization: show DStream operations · b93c97d7
      Andrew Or authored
      This is similar to #5999, but for streaming. Roughly 200 lines are tests.
      
      One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way.
      
      tdas zsxwing
      
      ------------------------
      **Before**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/>
      
      --------------------------
      **After**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6034 from andrewor14/dag-viz-streaming and squashes the following commits:
      
      932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      e685df9 [Andrew Or] Rename createRDDWith
      84d0656 [Andrew Or] Review feedback
      697c086 [Andrew Or] Fix tests
      53b9936 [Andrew Or] Set scopes for foreachRDD properly
      1881802 [Andrew Or] Refactor DStream scope names again
      af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      fd07d22 [Andrew Or] Make MQTT lower case
      f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases
      fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within
      1af0b0e [Andrew Or] Fix style
      074c00b [Andrew Or] Review comments
      d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      e4a93ac [Andrew Or] Fix tests?
      25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      9113183 [Andrew Or] Add tests for DStream scopes
      b3806ab [Andrew Or] Fix test
      bb80bbb [Andrew Or] Fix MIMA?
      5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      5703939 [Andrew Or] Rename operations that create InputDStreams
      7c4513d [Andrew Or] Group RDDs by DStream operations and batches
      bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      05c2676 [Andrew Or] Wrap many more methods in withScope
      c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      65ef3e9 [Andrew Or] Fix NPE
      a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations
      b93c97d7
    • zsxwing's avatar
      [SPARK-7658] [STREAMING] [WEBUI] Update the mouse behaviors for the timeline graphs · 0b6f503d
      zsxwing authored
      1. If the user click one point of a batch, scroll down to the corresponding batch row and highlight it. And recovery the batch row after 3 seconds if necessary.
      
      2. Add "#batches" in the histogram graphs.
      
      ![screen shot 2015-05-14 at 7 36 19 pm](https://cloud.githubusercontent.com/assets/1000778/7646108/84f4a014-fa73-11e4-8c13-1903d267e60f.png)
      
      ![screen shot 2015-05-14 at 7 36 53 pm](https://cloud.githubusercontent.com/assets/1000778/7646109/8b11154a-fa73-11e4-820b-8ece9fa6ee3e.png)
      
      ![screen shot 2015-05-14 at 7 36 34 pm](https://cloud.githubusercontent.com/assets/1000778/7646111/93828272-fa73-11e4-89f8-580670144d3c.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6168 from zsxwing/SPARK-7658 and squashes the following commits:
      
      c242b00 [zsxwing] Change 5 seconds to 3 seconds
      31fd0aa [zsxwing] Remove the mouseover highlight feature
      06c6f6f [zsxwing] Merge branch 'master' into SPARK-7658
      2eaff06 [zsxwing] Merge branch 'master' into SPARK-7658
      108d56c [zsxwing] Update the mouse behaviors for the timeline graphs
      0b6f503d
  11. May 17, 2015
    • zsxwing's avatar
      [SPARK-7693][Core] Remove "import scala.concurrent.ExecutionContext.Implicits.global" · ff71d34e
      zsxwing authored
      Learnt a lesson from SPARK-7655: Spark should avoid to use `scala.concurrent.ExecutionContext.Implicits.global` because the user may submit blocking actions to `scala.concurrent.ExecutionContext.Implicits.global` and exhaust all threads in it. This could crash Spark. So Spark should always use its own thread pools for safety.
      
      This PR removes all usages of `scala.concurrent.ExecutionContext.Implicits.global` and uses proper thread pools to replace them.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6223 from zsxwing/SPARK-7693 and squashes the following commits:
      
      a33ff06 [zsxwing] Decrease the max thread number from 1024 to 128
      cf4b3fc [zsxwing] Remove "import scala.concurrent.ExecutionContext.Implicits.global"
      ff71d34e
  12. May 15, 2015
  13. May 14, 2015
  14. May 13, 2015
    • Tathagata Das's avatar
      [SPARK-6752] [STREAMING] [REVISED] Allow StreamingContext to be recreated from... · bce00dac
      Tathagata Das authored
      [SPARK-6752] [STREAMING] [REVISED] Allow StreamingContext to be recreated from checkpoint and existing SparkContext
      
      This is a revision of the earlier version (see #5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.
      
      * Hard to implement in python.
      * New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553
      
      Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.
      
      So in this PR I have
      * Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
      * Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6096 from tdas/SPARK-6752 and squashes the following commits:
      
      53f4b2d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-6752
      f024b77 [Tathagata Das] Removed extra API and used SparkContext.getOrCreate
      bce00dac
    • Andrew Or's avatar
      [STREAMING] [MINOR] Keep streaming.UIUtils private · bb6dec3b
      Andrew Or authored
      zsxwing
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6134 from andrewor14/private-streaming-uiutils and squashes the following commits:
      
      225df94 [Andrew Or] Privatize class
      bb6dec3b
    • zsxwing's avatar
      [SPARK-7589] [STREAMING] [WEBUI] Make "Input Rate" in the Streaming page... · bec938f7
      zsxwing authored
      [SPARK-7589] [STREAMING] [WEBUI] Make "Input Rate" in the Streaming page consistent with other pages
      
      This PR makes "Input Rate" in the Streaming page consistent with Job and Stage pages.
      
      ![screen shot 2015-05-12 at 5 03 35 pm](https://cloud.githubusercontent.com/assets/1000778/7601444/f943f8ac-f8ca-11e4-8280-a715d814f434.png)
      ![screen shot 2015-05-12 at 5 07 25 pm](https://cloud.githubusercontent.com/assets/1000778/7601445/f9571c0c-f8ca-11e4-9b12-9317cb55c002.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6102 from zsxwing/SPARK-7589 and squashes the following commits:
      
      2745225 [zsxwing] Make "Input Rate" in the Streaming page consistent with other pages
      bec938f7
  15. May 12, 2015
    • Tathagata Das's avatar
      [SPARK-7554] [STREAMING] Throw exception when an active/stopped... · 23f7d66d
      Tathagata Das authored
      [SPARK-7554] [STREAMING] Throw exception when an active/stopped StreamingContext is used to create DStreams and output operations
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6099 from tdas/SPARK-7554 and squashes the following commits:
      
      2cd4158 [Tathagata Das] Throw exceptions on attempts to add stuff to active and stopped contexts.
      23f7d66d
    • Tathagata Das's avatar
      [SPARK-7553] [STREAMING] Added methods to maintain a singleton StreamingContext · 00e7b09a
      Tathagata Das authored
      In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands
      ```
      val ssc = new StreamingContext(...) // cmd 1
      ssc.start() // cmd 2
      ...
      val ssc = new StreamingContext(...) // accidentally run cmd 1 again
      ```
      The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
      Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
      Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6070 from tdas/SPARK-7553 and squashes the following commits:
      
      731c9a1 [Tathagata Das] Fixed style
      a797171 [Tathagata Das] Added more unit tests
      19fc70b [Tathagata Das] Added :: Experimental :: in docs
      64706c9 [Tathagata Das] Fixed test
      634db5d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7553
      3884a25 [Tathagata Das] Fixing test bug
      d37a846 [Tathagata Das] Added getActive and getActiveOrCreate
      00e7b09a
    • zsxwing's avatar
      [SPARK-7406] [STREAMING] [WEBUI] Add tooltips for "Scheduling Delay",... · 1422e79e
      zsxwing authored
      [SPARK-7406] [STREAMING] [WEBUI] Add tooltips for "Scheduling Delay", "Processing Time" and "Total Delay"
      
      Screenshots:
      ![screen shot 2015-05-06 at 2 29 03 pm](https://cloud.githubusercontent.com/assets/1000778/7504129/9c57f710-f3fc-11e4-9c6e-1b79c17c546d.png)
      
      ![screen shot 2015-05-06 at 2 24 35 pm](https://cloud.githubusercontent.com/assets/1000778/7504140/b63bb216-f3fc-11e4-83a5-6dfc6481d192.png)
      
      tdas as we discussed offline
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5952 from zsxwing/SPARK-7406 and squashes the following commits:
      
      2b004ea [zsxwing] Merge branch 'master' into SPARK-7406
      e9eb506 [zsxwing] Update tooltip contents
      2215b2a [zsxwing] Add tooltips for "Scheduling Delay", "Processing Time" and "Total Delay"
      1422e79e
    • Tathagata Das's avatar
      [SPARK-7532] [STREAMING] StreamingContext.start() made to logWarning and not throw exception · ec6f2a97
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6060 from tdas/SPARK-7532 and squashes the following commits:
      
      6fe2e83 [Tathagata Das] Update docs
      7dadfc3 [Tathagata Das] Fixed bug again
      99c7678 [Tathagata Das] Added logInfo
      65aec20 [Tathagata Das] Fix bug
      5bf031b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7532
      1a9a818 [Tathagata Das] Fix scaladoc
      c584313 [Tathagata Das] StreamingContext.start() made to logWarning and not throw exception
      ec6f2a97
    • Marcelo Vanzin's avatar
      [SPARK-7485] [BUILD] Remove pyspark files from assembly. · 82e890fb
      Marcelo Vanzin authored
      The sbt part of the build is hacky; it basically tricks sbt
      into generating the zip by using a generator, but returns
      an empty list for the generated files so that nothing is
      actually added to the assembly.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6022 from vanzin/SPARK-7485 and squashes the following commits:
      
      22c1e04 [Marcelo Vanzin] Remove unneeded code.
      4893622 [Marcelo Vanzin] [SPARK-7485] [build] Remove pyspark files from assembly.
      82e890fb
  16. May 11, 2015
    • Tathagata Das's avatar
      [SPARK-7530] [STREAMING] Added StreamingContext.getState() to expose the... · f9c7580a
      Tathagata Das authored
      [SPARK-7530] [STREAMING] Added StreamingContext.getState() to expose the current state of the context
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6058 from tdas/SPARK-7530 and squashes the following commits:
      
      80ee0e6 [Tathagata Das] STARTED --> ACTIVE
      3da6547 [Tathagata Das] Added synchronized
      dd88444 [Tathagata Das] Added more docs
      e1a8505 [Tathagata Das] Fixed comment length
      89f9980 [Tathagata Das] Change to Java enum and added Java test
      7c57351 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7530
      dd4e702 [Tathagata Das] Addressed comments.
      3d56106 [Tathagata Das] Added Mima excludes
      2b86ba1 [Tathagata Das] Added scala docs.
      1722433 [Tathagata Das] Fixed style
      976b094 [Tathagata Das] Added license
      0585130 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7530
      e0f0a05 [Tathagata Das] Added getState and exposed StreamingContextState
      f9c7580a
    • jerryshao's avatar
      [STREAMING] [MINOR] Close files correctly when iterator is finished in streaming WAL recovery · 25c01c54
      jerryshao authored
      Currently there's no chance to close the file correctly after the iteration is finished, change to `CompletionIterator` to avoid resource leakage.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #6050 from jerryshao/close-file-correctly and squashes the following commits:
      
      52dfaf5 [jerryshao] Close files correctly when iterator is finished
      25c01c54
    • Tathagata Das's avatar
      [SPARK-7361] [STREAMING] Throw unambiguous exception when attempting to start... · 1b465569
      Tathagata Das authored
      [SPARK-7361] [STREAMING] Throw unambiguous exception when attempting to start multiple StreamingContexts in the same JVM
      
      Currently attempt to start a streamingContext while another one is started throws a confusing exception that the action name JobScheduler is already registered. Instead its best to throw a proper exception as it is not supported.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5907 from tdas/SPARK-7361 and squashes the following commits:
      
      fb81c4a [Tathagata Das] Fix typo
      a9cd5bb [Tathagata Das] Added startSite to StreamingContext
      5fdfc0d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7361
      5870e2b [Tathagata Das] Added check for multiple streaming contexts
      1b465569
    • Wesley Miao's avatar
      [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time · d70a0768
      Wesley Miao authored
      tdas
      
      https://issues.apache.org/jira/browse/SPARK-7326
      
      The problem most likely resides in DStream.slice() implementation, as shown below.
      
        def slice(fromTime: Time, toTime: Time): Seq[RDD[T]] = {
          if (!isInitialized) {
            throw new SparkException(this + " has not been initialized")
          }
          if (!(fromTime - zeroTime).isMultipleOf(slideDuration)) {
            logWarning("fromTime (" + fromTime + ") is not a multiple of slideDuration ("
              + slideDuration + ")")
          }
          if (!(toTime - zeroTime).isMultipleOf(slideDuration)) {
            logWarning("toTime (" + fromTime + ") is not a multiple of slideDuration ("
              + slideDuration + ")")
          }
          val alignedToTime = toTime.floor(slideDuration, zeroTime)
          val alignedFromTime = fromTime.floor(slideDuration, zeroTime)
      
          logInfo("Slicing from " + fromTime + " to " + toTime +
            " (aligned to " + alignedFromTime + " and " + alignedToTime + ")")
      
          alignedFromTime.to(alignedToTime, slideDuration).flatMap(time => {
            if (time >= zeroTime) getOrCompute(time) else None
          })
        }
      
      Here after performing floor() on both fromTime and toTime, the result (alignedFromTime - zeroTime) and (alignedToTime - zeroTime) may no longer be multiple of the slidingDuration, thus making isTimeValid() check failed for all the remaining computation.
      
      The fix is to add a new floor() function in Time.scala to respect the zeroTime while performing the floor :
      
        def floor(that: Duration, zeroTime: Time): Time = {
          val t = that.milliseconds
          new Time(((this.millis - zeroTime.milliseconds) / t) * t + zeroTime.milliseconds)
        }
      
      And then change the DStream.slice to call this new floor function by passing in its zeroTime.
      
          val alignedToTime = toTime.floor(slideDuration, zeroTime)
          val alignedFromTime = fromTime.floor(slideDuration, zeroTime)
      
      This way the alignedToTime and alignedFromTime are *really* aligned in respect to zeroTime whose value is not really a 0.
      
      Author: Wesley Miao <wesley.miao@gmail.com>
      Author: Wesley <wesley.miao@autodesk.com>
      
      Closes #5871 from wesleymiao/spark-7326 and squashes the following commits:
      
      82a4d8c [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream dosen't work all the time
      48b4dc0 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
      6ade399 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
      2611745 [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
      d70a0768
  17. May 07, 2015
    • zsxwing's avatar
      [SPARK-7305] [STREAMING] [WEBUI] Make BatchPage show friendly information when... · 22ab70e0
      zsxwing authored
      [SPARK-7305] [STREAMING] [WEBUI] Make BatchPage show friendly information when jobs are dropped by SparkListener
      
      If jobs are dropped by SparkListener, at least we can show the job ids in BatchPage. Screenshot:
      
      ![b1](https://cloud.githubusercontent.com/assets/1000778/7434968/f19aa784-eff3-11e4-8f86-36a073873574.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5840 from zsxwing/SPARK-7305 and squashes the following commits:
      
      aca0ba6 [zsxwing] Fix the code style
      718765e [zsxwing] Make generateNormalJobRow private
      8073b03 [zsxwing] Merge branch 'master' into SPARK-7305
      83dec11 [zsxwing] Make BatchPage show friendly information when jobs are dropped by SparkListener
      22ab70e0
    • Tathagata Das's avatar
      [SPARK-7217] [STREAMING] Add configuration to control the default behavior of... · 01187f59
      Tathagata Das authored
      [SPARK-7217] [STREAMING] Add configuration to control the default behavior of StreamingContext.stop() implicitly calling SparkContext.stop()
      
      In environments like notebooks, the SparkContext is managed by the underlying infrastructure and it is expected that the SparkContext will not be stopped. However, StreamingContext.stop() calls SparkContext.stop() as a non-intuitive side-effect. This PR adds a configuration in SparkConf that sets the default StreamingContext stop behavior. It should be such that the existing behavior does not change for existing users.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5929 from tdas/SPARK-7217 and squashes the following commits:
      
      869a763 [Tathagata Das] Changed implementation.
      685fe00 [Tathagata Das] Added configuration
      01187f59
    • Tathagata Das's avatar
      [SPARK-7430] [STREAMING] [TEST] General improvements to streaming tests to increase debuggability · cfdadcbd
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5961 from tdas/SPARK-7430 and squashes the following commits:
      
      d654978 [Tathagata Das] Fix scala style
      fbf7174 [Tathagata Das] Added more verbose assert failure messages.
      6aea07a [Tathagata Das] Ensure SynchronizedBuffer is used in every TestSuiteBase
      cfdadcbd
  18. May 06, 2015
  19. May 05, 2015
    • zsxwing's avatar
      [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics · 489700c8
      zsxwing authored
      This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots:
      
      ![graph1](https://cloud.githubusercontent.com/assets/1000778/7165766/465942e0-e3dc-11e4-9b05-c184b09d75dc.png)
      
      ![graph2](https://cloud.githubusercontent.com/assets/1000778/7165779/53f13f34-e3dc-11e4-8714-a4a75b7e09ff.png)
      
      TODOs:
      - [x] Display more information on mouse hover
      - [x] Align the timeline and distribution graphs
      - [x] Clean up the codes
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5533 from zsxwing/SPARK-6939 and squashes the following commits:
      
      9f7cd19 [zsxwing] Merge branch 'master' into SPARK-6939
      deacc3f [zsxwing] Remove unused import
      cd03424 [zsxwing] Fix .rat-excludes
      70cc87d [zsxwing] Streaming Scheduling Delay => Scheduling Delay
      d457277 [zsxwing] Fix UIUtils in BatchPage
      b3f303e [zsxwing] Add comments for unclear classes and methods
      ff0bff8 [zsxwing] Make InputDStream.name private[streaming]
      cc392c5 [zsxwing] Merge branch 'master' into SPARK-6939
      e275e23 [zsxwing] Move time related methods to Streaming's UIUtils
      d5d86f6 [zsxwing] Fix incorrect lastErrorTime
      3be4b7a [zsxwing] Use InputInfo
      b50fa32 [zsxwing] Jump to the batch page when clicking a point in the timeline graphs
      203605d [zsxwing] Merge branch 'master' into SPARK-6939
      74307cf [zsxwing] Reuse the data for histogram graphs to reduce the page size
      2586916 [zsxwing] Merge branch 'master' into SPARK-6939
      70d8533 [zsxwing] Remove BatchInfo.numRecords and a few renames
      7bbdc0a [zsxwing] Hide the receiver sub table if no receiver
      a2972e9 [zsxwing] Add some ui tests for StreamingPage
      fd03ad0 [zsxwing] Add a test to verify no memory leak
      4a8f886 [zsxwing] Merge branch 'master' into SPARK-6939
      18607a1 [zsxwing] Merge branch 'master' into SPARK-6939
      d0b0aec [zsxwing] Clean up the codes
      a459f49 [zsxwing] Add a dash line to processing time graphs
      8e4363c [zsxwing] Prepare for the demo
      c81a1ee [zsxwing] Change time unit in the graphs automatically
      4c0b43f [zsxwing] Update Streaming UI
      04c7500 [zsxwing] Make the server and client use the same timezone
      fed8219 [zsxwing] Move the x axis at the top and show a better tooltip
      c23ce10 [zsxwing] Make two graphs close
      d78672a [zsxwing] Make the X axis use the same range
      881c907 [zsxwing] Use histogram for distribution
      5688702 [zsxwing] Fix the unit test
      ddf741a [zsxwing] Fix the unit test
      ad93295 [zsxwing] Remove unnecessary codes
      a0458f9 [zsxwing] Clean the codes
      b82ed1e [zsxwing] Update the graphs as per comments
      dd653a1 [zsxwing] Add timeline and histogram graphs for streaming statistics
      489700c8
    • Andrew Or's avatar
      [SPARK-7318] [STREAMING] DStream cleans objects that are not closures · 57e9f29e
      Andrew Or authored
      I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:
      
      8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
      5ee4e25 [Andrew Or] Fix tests
      eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
      67eeff4 [Andrew Or] Add tests
      a4fa768 [Andrew Or] Clean the closure, not the RDD
      57e9f29e
    • zsxwing's avatar
      [SPARK-7350] [STREAMING] [WEBUI] Attach the Streaming tab when calling ssc.start() · c6d1efba
      zsxwing authored
      It's meaningless to display the Streaming tab before `ssc.start()`. So we should attach it in the `ssc.start` method.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5898 from zsxwing/SPARK-7350 and squashes the following commits:
      
      e676487 [zsxwing] Attach the Streaming tab when calling ssc.start()
      c6d1efba
Loading