- Nov 24, 2013
-
-
Matei Zaharia authored
Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
-
Matei Zaharia authored
XORShift RNG with unit tests and benchmark This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950 To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000) XORShiftRandom is also an object with a main method taking the number of iterations as an argument, so you can also run it from the command line.
-
Reynold Xin authored
Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.
-
- Nov 23, 2013
-
-
Reynold Xin authored
AppendOnlyMap fixes - Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`) - Some other small optimizations throughout (see commit comments)
-
Matei Zaharia authored
- Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert
-
Matei Zaharia authored
- Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Use Object.equals() instead of Scala's == to compare keys, because the latter does extra casts for numeric types (see the equals method in https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)
-
Reynold Xin authored
Support preservesPartitioning in RDD.zipPartitions In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.
-
Ankur Dave authored
-
- Nov 21, 2013
-
-
Aaron Davidson authored
Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.
-
Reynold Xin authored
Fix Kryo Serializer buffer documentation inconsistency The documentation here is inconsistent with the coded default and other documentation.
-
Reynold Xin authored
TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open files when running shuffles.
-
Patrick Wendell authored
Without this fix you get a huge number of open shuffles after running shuffles.
-
- Nov 20, 2013
-
-
Neal Wiggins authored
The documentation here is inconsistent with the coded default and other documentation.
-
Reynold Xin authored
PartitionPruningRDD is using index from parent I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.
-
Matei Zaharia authored
Cleanup to remove semicolons (;) from Scala code -) The main reason for this PR is to remove semicolons from single statements of Scala code. -) Remove unused imports as I see them -) Fix ASF comment header from some of files (bad copy paste I suppose)
-
Marek Kolodziej authored
-
- Nov 19, 2013
-
-
Marek Kolodziej authored
-
Henry Saputra authored
-
Henry Saputra authored
Passed the sbt/sbt compile and test
-
Matei Zaharia authored
correct number of tasks in ExecutorsUI Index `a` is not `execId` here
-
Matei Zaharia authored
Impove Spark on Yarn Error handling Improve cli error handling and only allow a certain number of worker failures before failing the application. This will help prevent users from doing foolish things and their jobs running forever. For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries. The number of tries is configurable. Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it. I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems? I couldn't think of any and testing on standalone cluster as well as yarn.
-
Matei Zaharia authored
Enable the Broadcast examples to work in a cluster setting Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally. This issue came up on the mailing lists [here](http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3C2013111909591557147628%40ict.ac.cn%3E).
-
tgravescs authored
-
Henry Saputra authored
Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.
-
Matthew Taylor authored
-
Aaron Davidson authored
Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally.
-
Matthew Taylor authored
-
- Nov 18, 2013
-
-
Marek Kolodziej authored
-
Marek Kolodziej authored
To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000)
-
Russell Cardullo authored
* Reorder imports according to the style guide * Consistently use propertyToOption in all places
-
- Nov 17, 2013
-
-
shiyun.wxm authored
-
Reynold Xin authored
Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.
-
Reynold Xin authored
Add PrimitiveVectorSuite and fix bug in resize()
-
Aaron Davidson authored
-
Reynold Xin authored
-
BlackNiuza authored
-
Reynold Xin authored
1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.
-
BlackNiuza authored
-
- Nov 16, 2013
-
-
Matei Zaharia authored
Simple cleanup on Spark's Scala code Simple cleanup on Spark's Scala code while testing some modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method calls like size that does not have size effect.
-
- Nov 15, 2013
-
-
Henry Saputra authored
-) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method call like size that does not have size effect.
-