- Jan 13, 2014
-
-
Reynold Xin authored
-
Reynold Xin authored
-
Reynold Xin authored
-
Reynold Xin authored
`sbt/sbt doc` used to fail. This fixed it.
-
Ankur Dave authored
Improving documentation and identifying potential bug in CC calculation.
-
Ankur Dave authored
-
Ankur Dave authored
-
Joseph E. Gonzalez authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Patrick Wendell authored
Moved DStream and PairDSream to org.apache.spark.streaming.dstream Similar to the package location of `org.apache.spark.rdd.RDD`, `DStream` has been moved from `org.apache.spark.streaming.DStream` to `org.apache.spark.streaming.dstream.DStream`. I know that the package name is a little long, but I think its better to keep it consistent with Spark's structure. Also fixed persistence of windowed DStream. The RDDs generated generated by windowed DStream are essentially unions of underlying RDDs, and persistent these union RDDs would store numerous copies of the underlying data. Instead setting the persistence level on the windowed DStream is made to set the persistence level of the underlying DStream.
-
Ankur Dave authored
-
Reynold Xin authored
Remove now un-needed hostPort option I noticed this was logging some scary error messages in various places. After I looked into it, this is no longer really used. I removed the option and re-wrote the one remaining use case (it was unnecessary there anyways).
-
Tathagata Das authored
-
Joseph E. Gonzalez authored
-
Ankur Dave authored
-
- Jan 12, 2014
-
-
Tathagata Das authored
Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
-
Ankur Dave authored
-
Ankur Dave authored
-
Patrick Wendell authored
Disable shuffle file consolidation by default After running various performance tests for the 0.9 release, this still seems to have performance issues even on XFS. So let's keep this off-by-default for 0.9 and users can experiment with it depending on their disk configurations.
-
Patrick Wendell authored
Remove simple redundant return statements for Scala methods/functions Remove simple redundant return statements for Scala methods/functions: -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized -) Add small changes to making var to val if possible and remove () for simple get This hopefully makes the review simpler =) Pass compile and tests.
-
Joseph E. Gonzalez authored
-
Joseph E. Gonzalez authored
-
Patrick Wendell authored
Better error handling in Spark Streaming and more API cleanup Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug. This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext. Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392. And changed a lot of protected[streaming] to private[streaming].
-
Patrick Wendell authored
Rename DStream.foreach to DStream.foreachRDD `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
-
Patrick Wendell authored
-
Henry Saputra authored
-
Tathagata Das authored
-
Patrick Wendell authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Patrick Wendell authored
`foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
-
Patrick Wendell authored
Setting load defaults to true in executor This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations. /cc @andrewor14 who initially reported this issue.
-
Ankur Dave authored
-
Reynold Xin authored
Stop SparkListenerBus daemon thread when DAGScheduler is stopped. Otherwise this leads to hundreds of SparkListenerBus daemon threads in our unit tests (and also problematic if user applications launches multiple SparkContext).
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Tathagata Das authored
Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and NetworkInputTracker upon close.
-