-
- Downloads
[SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank
(Updated version of [PR-9457](https://github.com/apache/spark/pull/9457), rebased on latest Spark master, and using mllib-local). This implements a parallel version of personalized pagerank, which runs all propagations for a list of source vertices in parallel. I ran a few benchmarks on the full [DBpedia](http://dbpedia.org/) graph. When running personalized pagerank for only one source node, the existing implementation is twice as fast as the parallel one (because of the SparseVector overhead). However for 10 source nodes, the parallel implementation is four times as fast. When increasing the number of source nodes, this difference becomes even greater.  Author: Yves Raimond <yraimond@netflix.com> Closes #14998 from moustaki/parallel-ppr.
Showing
- graphx/pom.xml 5 additions, 0 deletionsgraphx/pom.xml
- graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala 11 additions, 1 deletiongraphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala
- graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala 81 additions, 0 deletions...src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
- graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala 24 additions, 0 deletions...est/scala/org/apache/spark/graphx/lib/PageRankSuite.scala
Please register or sign in to comment