Skip to content
Snippets Groups Projects
Commit 5e35d39e authored by Ankur Dave's avatar Ankur Dave
Browse files

Add PageRank example and data

parent f096f4ea
No related branches found
No related tags found
No related merge requests found
......@@ -470,10 +470,40 @@ things to worry about.)
# Graph Algorithms
<a name="graph_algorithms"></a>
This section should describe the various algorithms and how they are used.
GraphX includes a set of graph algorithms in to simplify analytics. The algorithms are contained in the `org.apache.spark.graphx.lib` package and can be accessed directly as methods on `Graph` via an implicit conversion to [`Algorithms`][Algorithms]. This section describes the algorithms and how they are used.
[Algorithms]: api/graphx/index.html#org.apache.spark.graphx.lib.Algorithms
## PageRank
PageRank measures the importance of each vertex in a graph, assuming an edge from *u* to *v* represents an endorsement of *v*'s importance by *u*. For example, if a Twitter user is followed by many others, the user will be ranked highly.
Spark includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We can compute the PageRank of each user as follows:
{% highlight scala %}
// Load the implicit conversion to Algorithms
import org.apache.spark.graphx.lib._
// Load the datasets into a graph
val users = sc.textFile("graphx/data/users.txt").map { line =>
val fields = line.split("\\s+")
(fields(0).toLong, fields(1))
}
val followers = sc.textFile("graphx/data/followers.txt").map { line =>
val fields = line.split("\\s+")
Edge(fields(0).toLong, fields(1).toLong, 1)
}
val graph = Graph(users, followers)
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val ranksByUsername = users.leftOuterJoin(ranks).map {
case (id, (username, rankOpt)) => (username, rankOpt.getOrElse(0.0))
}
// Print the result
println(ranksByUsername.collect().mkString("\n"))
{% endhighlight %}
## Connected Components
## Shortest Path
......
2 1
3 1
4 1
6 1
3 2
6 2
7 2
6 3
7 3
7 6
6 7
3 7
1 BarackObama
2 ericschmidt
3 jeresig
4 justinbieber
6 matei_zaharia
7 odersky
......@@ -106,7 +106,7 @@ object PageRank extends Logging {
* @tparam ED the original edge attribute (not used)
*
* @param graph the graph on which to compute PageRank
* @param tol the tolerance allowed at convergence (smaller => more * accurate).
* @param tol the tolerance allowed at convergence (smaller => more accurate).
* @param resetProb the random reset probability (alpha)
*
* @return the graph containing with each vertex containing the PageRank and each edge
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment