Commits · 859d62dc2a37f56f8c85138df069a359e7fabb45 · cs525-sp18-g07 / spark

Nov 24, 2013

Merge pull request #151 from russellcardullo/add-graphite-sink · 859d62dc

Matei Zaharia authored 11 years ago

Add graphite sink for metrics

This adds a metrics sink for graphite.  The sink must
be configured with the host and port of a graphite node
and optionally may be configured with a prefix that will
be prepended to all metrics that are sent to graphite.

859d62dc

Merge pull request #185 from mkolod/random-number-generator · 65de73c7

Matei Zaharia authored 11 years ago

XORShift RNG with unit tests and benchmark

This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950

To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
XORShiftRandom is also an object with a main method taking the
number of iterations as an argument, so you can also run it
from the command line.

65de73c7

Merge pull request #197 from aarondav/patrick-fix · 972171b9

Reynold Xin authored 11 years ago

Fix 'timeWriting' stat for shuffle files

Due to concurrent git branches, changes from shuffle file consolidation patch
caused the shuffle write timing patch to no longer actually measure the time,
since it requires time be measured after the stream has been closed.

972171b9

Nov 23, 2013

Merge pull request #200 from mateiz/hash-fix · 718cc803

Reynold Xin authored 11 years ago

AppendOnlyMap fixes

- Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`)
- Some other small optimizations throughout (see commit comments)

718cc803

Some other optimizations to AppendOnlyMap: · 9837a602

Matei Zaharia authored 11 years ago

- Don't check keys for equality when re-inserting due to growing the
  table; the keys will already be unique
- Remember the grow threshold instead of recomputing it on each insert

9837a602

Fixes to AppendOnlyMap: · 7535d7fb

Matei Zaharia authored 11 years ago

- Use Murmur Hash 3 finalization step to scramble the bits of HashCode
  instead of the simpler version in java.util.HashMap; the latter one
  had trouble with ranges of consecutive integers. Murmur Hash 3 is used
  by fastutil.
- Use Object.equals() instead of Scala's == to compare keys, because the
  latter does extra casts for numeric types (see the equals method in
  https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)

7535d7fb

Merge pull request #198 from ankurdave/zipPartitions-preservesPartitioning · 51aa9d6e

Reynold Xin authored 11 years ago

Support preservesPartitioning in RDD.zipPartitions

In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.

51aa9d6e

Support preservesPartitioning in RDD.zipPartitions · c1507afc
Ankur Dave authored 11 years ago

c1507afc

Nov 21, 2013

Fix 'timeWriting' stat for shuffle files · ccea38b7

Aaron Davidson authored 11 years ago

ccea38b7

Merge pull request #193 from aoiwelle/patch-1 · 086b097e

Reynold Xin authored 11 years ago

Fix Kryo Serializer buffer documentation inconsistency

The documentation here is inconsistent with the coded default and other documentation.

086b097e

Merge pull request #196 from pwendell/master · f20093c3

Reynold Xin authored 11 years ago

TimeTrackingOutputStream should pass on calls to close() and flush().

Without this fix you get a huge number of open files when running shuffles.

f20093c3

TimeTrackingOutputStream should pass on calls to close() and flush(). · 53b94ef2
Patrick Wendell authored 11 years ago
```
Without this fix you get a huge number of open shuffles after running
shuffles.
```
53b94ef2

Nov 20, 2013

Fix Kryo Serializer buffer inconsistency · 21b5478e
Neal Wiggins authored 11 years ago
```
The documentation here is inconsistent with the coded default and other documentation.
```
21b5478e

Merge branch 'master' of github.com:tbfenet/incubator-spark · 2fead510

Reynold Xin authored 11 years ago

PartitionPruningRDD is using index from parent

I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.

2fead510

Merge pull request #191 from hsaputra/removesemicolonscala · 4b895013

Matei Zaharia authored 11 years ago

Cleanup to remove semicolons (;) from Scala code

-) The main reason for this PR is to remove semicolons from single statements of Scala code.
-) Remove unused imports as I see them
-) Fix ASF comment header from some of files (bad copy paste I suppose)

4b895013

Make XORShiftRandom explicit in KMeans and roll it back for RDD · 22724659
Marek Kolodziej authored 11 years ago

22724659

Nov 19, 2013

Formatting and scoping (private[spark]) updates · bcc6ed30
Marek Kolodziej authored 11 years ago

bcc6ed30
Merge branch 'master' into removesemicolonscala · 43dfac51
Henry Saputra authored 11 years ago

43dfac51
Another set of changes to remove unnecessary semicolon (;) from Scala code. · 10be58f2
Henry Saputra authored 11 years ago
```
Passed the sbt/sbt compile and test
```
10be58f2
Merge pull request #181 from BlackNiuza/fix_tasks_number · f568912f
Matei Zaharia authored 11 years ago
```
correct number of tasks in ExecutorsUI

Index `a` is not `execId` here
```
f568912f

Merge pull request #189 from tgravescs/sparkYarnErrorHandling · aa638ed9

Matei Zaharia authored 11 years ago

Impove Spark on Yarn Error handling

Improve cli error handling and only allow a certain number of worker failures before failing the application. This will help prevent users from doing foolish things and their jobs running forever. For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries. The number of tries is configurable. Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it.

I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems? I couldn't think of any and testing on standalone cluster as well as yarn.

aa638ed9

Merge pull request #187 from aarondav/example-bcast-test · 55925805

Matei Zaharia authored 11 years ago

Enable the Broadcast examples to work in a cluster setting

Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally.

This issue came up on the mailing lists [here](http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3C2013111909591557147628%40ict.ac.cn%3E).

55925805

Impove Spark on Yarn Error handling · 4093e939
tgravescs authored 11 years ago

4093e939

Remove the semicolons at the end of Scala code to make it more pure Scala code. · 9c934b64

Henry Saputra authored 11 years ago

Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.

Passing compile and tests.

9c934b64

PartitionPruningRDD is using index from parent(review changes) · f639b65e
Matthew Taylor authored 11 years ago

f639b65e

Enable the Broadcast examples to work in a cluster setting · 50fd8d98

Aaron Davidson authored 11 years ago

Since they rely on println to display results, we need to first collect
those results to the driver to have them actually display locally.

50fd8d98

PartitionPruningRDD is using index from parent · 13b9bf49
Matthew Taylor authored 11 years ago

13b9bf49

Nov 18, 2013

Updates to reflect pull request code review · 99cfe89c
Marek Kolodziej authored 11 years ago

99cfe89c

XORShift RNG with unit tests and benchmark · 09bdfe3b

Marek Kolodziej authored 11 years ago

To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)

09bdfe3b

Cleanup GraphiteSink.scala based on feedback · 1360f62d
Russell Cardullo authored 11 years ago
```
* Reorder imports according to the style guide
* Consistently use propertyToOption in all places
```
1360f62d

Nov 17, 2013
- use HashSet.empty[Long] instead of Seq[Long] · eda05fa4
  shiyun.wxm authored 11 years ago
  
  eda05fa4
- Merge pull request #182 from rxin/vector · e2ebc3a9
  Reynold Xin authored 11 years ago
  
  Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.
  e2ebc3a9
- Merge pull request #3 from aarondav/pv-test · 26f616d7
  Reynold Xin authored 11 years ago
  
  Add PrimitiveVectorSuite and fix bug in resize()
  26f616d7
- Add PrimitiveVectorSuite and fix bug in resize() · 85763f49
  Aaron Davidson authored 11 years ago
  
  85763f49
- Return the vector itself for trim and resize method in PrimitiveVector. · 16a2286d
  Reynold Xin authored 11 years ago
  
  16a2286d
- rename "a" to "statusId" · ecfbaf24
  BlackNiuza authored 11 years ago
  
  ecfbaf24
- Slightly enhanced PrimitiveVector: · c30979c7
  Reynold Xin authored 11 years ago
  
  1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.
  c30979c7
- correct number of tasks in ExecutorsUI · b60839e5
  BlackNiuza authored 11 years ago
  
  b60839e5
Nov 16, 2013

Merge pull request #178 from hsaputra/simplecleanupcode · 1b5b3583

Matei Zaharia authored 11 years ago

Simple cleanup on Spark's Scala code

Simple cleanup on Spark's Scala code while testing some modules:
-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method calls like size that does not have size effect.

1b5b3583

Nov 15, 2013

Simple cleanup on Spark's Scala code while testing core and yarn modules: · c33f8020

Henry Saputra authored 11 years ago

-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method call like size that does not have size effect.

c33f8020