Skip to content
Snippets Groups Projects
Commit 78062b85 authored by Andrew Ray's avatar Andrew Ray Committed by Ankur Dave
Browse files

[SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to slow convergence

## What changes were proposed in this pull request?

Change the initial value in all PageRank implementations to be `1.0` instead of `resetProb` (default `0.15`) and use `outerJoinVertices` instead of `joinVertices` so that source vertices get updated in each iteration.

This seems to have been introduced a long time ago in https://github.com/apache/spark/commit/15a564598fe63003652b1e24527c432080b5976c#diff-b2bf3f97dcd2f19d61c921836159cda9L90

With the exception of graphs with sinks (which currently give incorrect results see SPARK-18847) this gives faster convergence as the sum of ranks is already correct (sum of ranks should be number of vertices).

Convergence comparision benchmark for small graph: http://imgur.com/a/HkkZf
Code for benchmark: https://gist.github.com/aray/a7de1f3801a810f8b1fa00c271a1fefd

## How was this patch tested?

(corrected) existing unit tests and additional test that verifies against result of igraph and NetworkX on a loop with a source.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #16271 from aray/pagerank-initial-value.
parent 172a52f5
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment