-
- Downloads
[SPARK-3936] Add aggregateMessages, which supersedes mapReduceTriplets
aggregateMessages enables neighborhood computation similarly to mapReduceTriplets, but it introduces two API improvements: 1. Messages are sent using an imperative interface based on EdgeContext rather than by returning an iterator of messages. 2. Rather than attempting bytecode inspection, the required triplet fields must be explicitly specified by the user by passing a TripletFields object. This fixes SPARK-3936. Additionally, this PR includes the following optimizations for aggregateMessages and EdgePartition: 1. EdgePartition now stores local vertex ids instead of global ids. This avoids hash lookups when looking up vertex attributes and aggregating messages. 2. Internal iterators in aggregateMessages are inlined into a while loop. In total, these optimizations were tested to provide a 37% speedup on PageRank (uk-2007-05 graph, 10 iterations, 16 r3.2xlarge machines, sped up from 513 s to 322 s). Subsumes apache/spark#2815. Also fixes SPARK-4173. Author: Ankur Dave <ankurdave@gmail.com> Closes #3100 from ankurdave/aggregateMessages and squashes the following commits: f5b65d0 [Ankur Dave] Address @rxin comments on apache/spark#3054 and apache/spark#3100 1e80aca [Ankur Dave] Add aggregateMessages, which supersedes mapReduceTriplets 194a2df [Ankur Dave] Test triplet iterator in EdgePartition serialization test e0f8ecc [Ankur Dave] Take activeSet in ExistingEdgePartitionBuilder c85076d [Ankur Dave] Readability improvements b567be2 [Ankur Dave] iter.foreach -> while loop 4a566dc [Ankur Dave] Optimizations for mapReduceTriplets and EdgePartition
Showing
- graphx/src/main/scala/org/apache/spark/graphx/EdgeContext.scala 51 additions, 0 deletions.../src/main/scala/org/apache/spark/graphx/EdgeContext.scala
- graphx/src/main/scala/org/apache/spark/graphx/Graph.scala 124 additions, 13 deletionsgraphx/src/main/scala/org/apache/spark/graphx/Graph.scala
- graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala 46 additions, 39 deletionsgraphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala
- graphx/src/main/scala/org/apache/spark/graphx/TripletFields.java 51 additions, 0 deletions...src/main/scala/org/apache/spark/graphx/TripletFields.java
- graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala 292 additions, 86 deletions...in/scala/org/apache/spark/graphx/impl/EdgePartition.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartitionBuilder.scala 83 additions, 12 deletions...a/org/apache/spark/graphx/impl/EdgePartitionBuilder.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/EdgeTripletIterator.scala 0 additions, 84 deletions...la/org/apache/spark/graphx/impl/EdgeTripletIterator.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala 51 additions, 34 deletions...c/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/RoutingTablePartition.scala 3 additions, 5 deletions.../org/apache/spark/graphx/impl/RoutingTablePartition.scala
- graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala 3 additions, 3 deletions...src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
- graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala 23 additions, 23 deletions.../main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala
- graphx/src/main/scala/org/apache/spark/graphx/lib/TriangleCount.scala 10 additions, 9 deletions...ain/scala/org/apache/spark/graphx/lib/TriangleCount.scala
- graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala 17 additions, 2 deletions...x/src/test/scala/org/apache/spark/graphx/GraphSuite.scala
- graphx/src/test/scala/org/apache/spark/graphx/impl/EdgePartitionSuite.scala 12 additions, 29 deletions...ala/org/apache/spark/graphx/impl/EdgePartitionSuite.scala
Loading
Please register or sign in to comment