- Dec 31, 2013
-
-
Hossein Falaki authored
-
- Dec 30, 2013
-
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
- Dec 24, 2013
-
-
Reynold Xin authored
Show full stack trace and time taken in unit tests.
-
- Dec 23, 2013
-
-
Reynold Xin authored
-
Matei Zaharia authored
Refactored the streaming scheduler and added StreamingListener interface - Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package. - Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it. - Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Reynold Xin authored
Added SPARK-968 implementation for review Added SPARK-968 implementation for review
-
wangda.tan authored
-
- Dec 22, 2013
-
-
wangda.tan authored
-
- Dec 20, 2013
-
-
Tathagata Das authored
-
Patrick Wendell authored
Minor cleanup for standalone scheduler See commit messages
-
- Dec 19, 2013
-
-
Patrick Wendell authored
Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.
-
Aaron Davidson authored
-
Aaron Davidson authored
As a lonely child with no one to care for it... we had to put it down.
-
Aaron Davidson authored
-
Reynold Xin authored
Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
-
Shivaram Venkataraman authored
-
Shivaram Venkataraman authored
Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
-
Matei Zaharia authored
Add toString to Java RDD, and __repr__ to Python RDD Addresses [SPARK-992](https://spark-project.atlassian.net/browse/SPARK-992)
-
Nick Pentreath authored
-
Reynold Xin authored
[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See [bug](https://spark-project.atlassian.net/browse/SPARK-959) for more details. Note that this may not be the best solution, as I do not understand the root cause of why this only happens for some people. However, it is reported to work.
-
Tathagata Das authored
-
Aaron Davidson authored
Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See bug for more details. (This is probably also the beginning of the slow death of our recently prettified dependencies. Form follow function.)
-
Reynold Xin authored
Increase spark.akka.askTimeout default to 30 seconds In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.
-
- Dec 18, 2013
-
-
Aaron Davidson authored
despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.
-
Tathagata Das authored
Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala
-
Reynold Xin authored
Fix Cygwin support in several scripts. This allows the spark-shell, spark-class, run-example, make-distribution.sh, and ./bin/start-* scripts to work under Cygwin. Note that this doesn't support PySpark under Cygwin, since that requires many additional `cygpath` calls from within Python and will be non-trivial to implement. This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).
-
Tathagata Das authored
-
Reynold Xin authored
Fixed the example link in the Scala programing guid. The old link cannot access, I changed to the new one.
-
Shivaram Venkataraman authored
Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
-
Tor Myklebust authored
-
fengdong authored
-
Reynold Xin authored
Fixed a performance problem in RDD.top and BoundedPriorityQueue BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion. This should also cherry pick cleanly into branch-0.8.
-