Skip to content
Snippets Groups Projects
user avatar
Yves Raimond authored
(Updated version of [PR-9457](https://github.com/apache/spark/pull/9457), rebased on latest Spark master, and using mllib-local).

This implements a parallel version of personalized pagerank, which runs all propagations for a list of source vertices in parallel.

I ran a few benchmarks on the full [DBpedia](http://dbpedia.org/) graph. When running personalized pagerank for only one source node, the existing implementation is twice as fast as the parallel one (because of the SparseVector overhead). However for 10 source nodes, the parallel implementation is four times as fast. When increasing the number of source nodes, this difference becomes even greater.

![image](https://cloud.githubusercontent.com/assets/2491/10927702/dd82e4fa-8256-11e5-89a8-4799b407f502.png)

Author: Yves Raimond <yraimond@netflix.com>

Closes #14998 from moustaki/parallel-ppr.
1fec3ce4
History
Name Last commit Last update
..
src
pom.xml