- Nov 13, 2013
-
-
Joseph E. Gonzalez authored
Addressing bug in BitSet.setUntil(ind) where if invoked with a multiple of 64 could lead to an index out of bounds error.
-
- Nov 11, 2013
-
-
Joey authored
Changing image references to master branch.
-
Reynold Xin authored
Cleanup of VertexSetRDD
-
Joseph E. Gonzalez authored
-
Reynold Xin authored
Shuffle replicated vertex attributes efficiently in columnar format
-
Reynold Xin authored
Specialize mapReduceTriplets for accessing subsets of vertex attributes
-
Ankur Dave authored
ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws a ClassNotFoundException when called with a closure defined in the console. This commit catches the exception and conservatively assumes the closure references all edge attributes.
-
- Nov 10, 2013
-
-
Ankur Dave authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Ankur Dave authored
-
Joey authored
Looks good to me.
-
Reynold Xin authored
-
- Nov 09, 2013
-
-
Reynold Xin authored
Kryo messages
-
- Nov 08, 2013
-
-
Joseph E. Gonzalez authored
-
- Nov 07, 2013
-
-
Joseph E. Gonzalez authored
-
Joseph E. Gonzalez authored
-
Joseph E. Gonzalez authored
-
Reynold Xin authored
-
Reynold Xin authored
Conflicts: graph/src/main/scala/org/apache/spark/graph/impl/GraphImpl.scala
-
Reynold Xin authored
-
Ankur Dave authored
GraphX Console with Logo Text
-
Ankur Dave authored
Changing Pregel API to use mapReduceTriplets instead of aggregateNeighbors
-
Joseph E. Gonzalez authored
Changing Pregel API to operate directly on edge triplets in SendMessage rather than (Vid, EdgeTriplet) pairs.
-
Reynold Xin authored
Converted for loops to while loops in EdgePartition.
-
Reynold Xin authored
-
Reynold Xin authored
Added GraphX to classpath.
-
Reynold Xin authored
-
Reynold Xin authored
Reverting to Array based (materialized) output in VertexSetRDD
-
- Nov 05, 2013
-
-
Joseph E. Gonzalez authored
-
Joey authored
Merge Spark master into graphx
-
Joseph E. Gonzalez authored
- Nov 04, 2013
-
-
Reynold Xin authored
Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
-
Joseph E. Gonzalez authored
the spark shell but with GraphX packages automatically imported and with Kryo serialization enabled for GraphX types. In addition the graphx-shell has a nifty new logo. To make these changes minimally invasive in the SparkILoop.scala I added some additional environment variables: SPARK_BANNER_TEXT: If set this string is displayed instead of the spark logo SPARK_SHELL_INIT_BLOCK: if set this expression is evaluated in the spark shell after the spark context is created.
-
Reynold Xin authored
Memory-optimized shuffle file consolidation Reduces overhead of each shuffle block for consolidation from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes. Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation. This is accomplished by replacing the map from ShuffleBlockId to FileSegment (i.e., block id to where it's located), which had high overhead due to being a gigantic, timestamped, concurrent map with a more space-efficient structure. Namely, the following are introduced (I have omitted the word "Shuffle" from some names for clarity): **ShuffleFile** - there is one ShuffleFile per consolidated shuffle file on disk. We store an array of offsets into the physical shuffle file for each ShuffleMapTask that wrote into the file. This is sufficient to reconstruct FileSegments for mappers that are in the file. **FileGroup** - contains a set of ShuffleFiles, one per reducer, that a MapTask can use to write its output. There is one FileGroup created per _concurrent_ MapTask. The FileGroup contains an array of the mapIds that have been written to all files in the group. The positions of elements in this array map directly onto the positions in each ShuffleFile's offsets array. In order to locate the FileSegment associated with a BlockId, we have another structure which maps each reducer to the set of ShuffleFiles that were created for it. (There will be as many ShuffleFiles per reducer as there are FileGroups.) To lookup a given ShuffleBlockId (shuffleId, reducerId, mapId), we thus search through all ShuffleFiles associated with that reducer. As a time optimization, we ensure that FileGroups are only reused for MapTasks with monotonically increasing mapIds. This allows us to perform a binary search to locate a mapId inside a group, and also enables potential future optimization (based on the usual monotonic access order).
-