Commits · f0ef75c7a41c417096398d55c08785d1884c4d85 · cs525-sp18-g07 / spark

Nov 13, 2013
- Addressing bug in BitSet.setUntil(ind) where if invoked with a multiple of 64... · f0ef75c7
  Joseph E. Gonzalez authored 11 years ago
  
  Addressing bug in BitSet.setUntil(ind) where if invoked with a multiple of 64 could lead to an index out of bounds error.
  f0ef75c7
Nov 11, 2013
- Update README.md · 143c01db
  Joey authored 11 years ago
  
  Changing image references to master branch.
  143c01db
- Merge pull request #63 from jegonzal/VertexSetCleanup · 2e8d4503
  Reynold Xin authored 11 years ago
  
  Cleanup of VertexSetRDD
  2e8d4503
- Cleanning up documentation of VertexSetRDD.scala · 57709208
  Joseph E. Gonzalez authored 11 years ago
  
  57709208
- Merge pull request #61 from ankurdave/pid2vid · b8e294a2
  Reynold Xin authored 11 years ago
  
  Shuffle replicated vertex attributes efficiently in columnar format
  b8e294a2
- Merge pull request #55 from ankurdave/aggregateNeighbors-variants · 3d7277cc
  Reynold Xin authored 11 years ago
  
  Specialize mapReduceTriplets for accessing subsets of vertex attributes
  3d7277cc
- Handle ClassNotFoundException from ByteCodeUtils · bee10156
  Ankur Dave authored 11 years ago
  
  ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws a ClassNotFoundException when called with a closure defined in the console. This commit catches the exception and conservatively assumes the closure references all edge attributes.
  bee10156
Nov 10, 2013
- Build pid2vid structures only once, in Vid2Pid · d1ff1b72
  Ankur Dave authored 11 years ago
  
  d1ff1b72
- Use pid2vid for creating VTableReplicatedValues · 502c5117
  Ankur Dave authored 11 years ago
  
  502c5117
- Fix typo · 53d24a97
  Ankur Dave authored 11 years ago
  
  53d24a97
- Add test for mapReduceTriplets in GraphSuite · aa24b0bb
  Ankur Dave authored 11 years ago
  
  aa24b0bb
- Factor out VTableReplicatedValues · bf4e45e6
  Ankur Dave authored 11 years ago
  
  bf4e45e6
- Create all versions of vid2pid ahead of time · cdbd19bb
  Ankur Dave authored 11 years ago
  
  cdbd19bb
- Test no vertex attribute replication · 27e4355d
  Ankur Dave authored 11 years ago
  
  27e4355d
- Optimize mrTriplets for source-attr-only mapF using bytecode inspection · 80abc280
  Ankur Dave authored 11 years ago
  
  80abc280
- Merge pull request #60 from amplab/rxin · 1a06f707
  Joey authored 11 years ago
  
  Looks good to me.
  1a06f707
- Fix the hanging bug. · 0e813cd4
  Reynold Xin authored 11 years ago
  
  0e813cd4
Nov 09, 2013
- Merge pull request #58 from jegonzal/KryoMessages · f6c94620
  Reynold Xin authored 11 years ago
  
  Kryo messages
  f6c94620
Nov 08, 2013
- Adding unit tests to reproduce error. · 6083e435
  Joseph E. Gonzalez authored 11 years ago
  
  6083e435
Nov 07, 2013
- Fixing tests · 161784d0
  Joseph E. Gonzalez authored 11 years ago
  
  161784d0
- merged and debugged · e523f0d2
  Joseph E. Gonzalez authored 11 years ago
  
  e523f0d2
- Additional optimizations · 908e6064
  Joseph E. Gonzalez authored 11 years ago
  
  908e6064
- Made more specialized messages. · bac7be30
  Reynold Xin authored 11 years ago
  
  bac7be30
- Merge branch 'master' into rxin · 64ad3b18
  Reynold Xin authored 11 years ago
  
  Conflicts: graph/src/main/scala/org/apache/spark/graph/impl/GraphImpl.scala
  64ad3b18
- Use custom serializer for aggregation messages when the data type is int/double. · 2406bf33
  Reynold Xin authored 11 years ago
  
  2406bf33
- Merge pull request #49 from jegonzal/graphxshell · 6ee05be1
  Ankur Dave authored 11 years ago
  
  GraphX Console with Logo Text
  6ee05be1
- Merge pull request #56 from jegonzal/PregelAPIChanges · a9f96b54
  Ankur Dave authored 11 years ago
  
  Changing Pregel API to use mapReduceTriplets instead of aggregateNeighbors
  a9f96b54
- Changing Pregel API to operate directly on edge triplets in SendMessage rather... · e9308e0e
  Joseph E. Gonzalez authored 11 years ago
  
  Changing Pregel API to operate directly on edge triplets in SendMessage rather than (Vid, EdgeTriplet) pairs.
  e9308e0e
- Merge pull request #54 from amplab/rxin · 5907137d
  Reynold Xin authored 11 years ago
  
  Converted for loops to while loops in EdgePartition.
  5907137d
- Converted for loops to while loops in EdgePartition. · 6fadff2b
  Reynold Xin authored 11 years ago
  
  6fadff2b
- Merge pull request #53 from amplab/rxin · edf41647
  Reynold Xin authored 11 years ago
  
  Added GraphX to classpath.
  edf41647
- Added GraphX to classpath. · 95f1f531
  Reynold Xin authored 11 years ago
  
  95f1f531
- Merge pull request #51 from jegonzal/VertexSetRDD · c379e104
  Reynold Xin authored 11 years ago
  
  Reverting to Array based (materialized) output in VertexSetRDD
  c379e104
Nov 05, 2013
- Merge branch 'master' of https://github.com/amplab/graphx into graphxshell · 8ac15e8e
  Joseph E. Gonzalez authored 11 years ago
  
  8ac15e8e
- merging upstream changes · 3e504938
  Joseph E. Gonzalez authored 11 years ago
  
  3e504938
- Merge pull request #50 from amplab/mergemerge · ca44b513
  Joey authored 11 years ago
  
  Merge Spark master into graphx
  ca44b513
- Reverting to Array based (materialized) output of all VertexSetRDD operations. · 2dc9ec23
  Joseph E. Gonzalez authored 11 years ago
  
  2dc9ec23
Nov 04, 2013

Merge branch 'master' of github.com:apache/incubator-spark into mergemerge · 551a43fd

Reynold Xin authored 11 years ago

Conflicts:
	README.md
	core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
	core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
	core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala

551a43fd

This commit adds a new graphx-shell which is essentially the same as · 3c37928f

Joseph E. Gonzalez authored 11 years ago

the spark shell but with GraphX packages automatically imported
and with Kryo serialization enabled for GraphX types.

In addition the graphx-shell has a nifty new logo.

To make these changes minimally invasive in the SparkILoop.scala
I added some additional environment variables:

   SPARK_BANNER_TEXT: If set this string is displayed instead
   of the spark logo

   SPARK_SHELL_INIT_BLOCK: if set this expression is evaluated in the
   spark shell after the spark context is created.

3c37928f

Merge pull request #130 from aarondav/shuffle · 7a26104a

Reynold Xin authored 11 years ago

Memory-optimized shuffle file consolidation

Reduces overhead of each shuffle block for consolidation from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes.

Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation.

This is accomplished by replacing the map from ShuffleBlockId to FileSegment (i.e., block id to where it's located), which had high overhead due to being a gigantic, timestamped, concurrent map with a more space-efficient structure. Namely, the following are introduced (I have omitted the word "Shuffle" from some names for clarity):
**ShuffleFile** - there is one ShuffleFile per consolidated shuffle file on disk. We store an array of offsets into the physical shuffle file for each ShuffleMapTask that wrote into the file. This is sufficient to reconstruct FileSegments for mappers that are in the file.
**FileGroup** - contains a set of ShuffleFiles, one per reducer, that a MapTask can use to write its output. There is one FileGroup created per _concurrent_ MapTask. The FileGroup contains an array of the mapIds that have been written to all files in the group. The positions of elements in this array map directly onto the positions in each ShuffleFile's offsets array.

In order to locate the FileSegment associated with a BlockId, we have another structure which maps each reducer to the set of ShuffleFiles that were created for it. (There will be as many ShuffleFiles per reducer as there are FileGroups.) To lookup a given ShuffleBlockId (shuffleId, reducerId, mapId), we thus search through all ShuffleFiles associated with that reducer.

As a time optimization, we ensure that FileGroups are only reused for MapTasks with monotonically increasing mapIds. This allows us to perform a binary search to locate a mapId inside a group, and also enables potential future optimization (based on the usual monotonic access order).

7a26104a