Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
spark
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
cs525-sp18-g07
spark
Commits
66c9d009
Commit
66c9d009
authored
11 years ago
by
Joseph E. Gonzalez
Browse files
Options
Downloads
Patches
Plain Diff
Tested and corrected all examples up to mask in the graphx-programming-guide.
parent
1efe78a1
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
docs/graphx-programming-guide.md
+20
-17
20 additions, 17 deletions
docs/graphx-programming-guide.md
with
20 additions
and
17 deletions
docs/graphx-programming-guide.md
+
20
−
17
View file @
66c9d009
...
@@ -80,6 +80,8 @@ To get started you first need to import Spark and GraphX into your project, as f
...
@@ -80,6 +80,8 @@ To get started you first need to import Spark and GraphX into your project, as f
{% highlight scala %}
{% highlight scala %}
import org.apache.spark._
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.graphx._
// To make some of the examples work we will also need RDD
import org.apache.spark.rdd.RDD
{% endhighlight %}
{% endhighlight %}
If you are not using the Spark shell you will also need a Spark context.
If you are not using the Spark shell you will also need a Spark context.
...
@@ -105,13 +107,11 @@ be accomplished through inheritance. For example to model users and products as
...
@@ -105,13 +107,11 @@ be accomplished through inheritance. For example to model users and products as
we might do the following:
we might do the following:
{% highlight scala %}
{% highlight scala %}
case class VertexProperty
class VertexProperty()
case class UserProperty extends VertexProperty
case class UserProperty(val name: String) extends VertexProperty
(val name: String)
case class ProductProperty(val name: String, val price: Double) extends VertexProperty
case class ProductProperty extends VertexProperty
(val name: String, val price: Double)
// The graph might then have the type:
// The graph might then have the type:
va
l
graph: Graph[VertexProperty, String]
va
r
graph: Graph[VertexProperty, String]
= null
{% endhighlight %}
{% endhighlight %}
Like RDDs, property graphs are immutable, distributed, and fault-tolerant. Changes to the values or
Like RDDs, property graphs are immutable, distributed, and fault-tolerant. Changes to the values or
...
@@ -165,13 +165,13 @@ code constructs a graph from a collection of RDDs:
...
@@ -165,13 +165,13 @@ code constructs a graph from a collection of RDDs:
// Assume the SparkContext has already been constructed
// Assume the SparkContext has already been constructed
val sc: SparkContext
val sc: SparkContext
// Create an RDD for the vertices
// Create an RDD for the vertices
val users: RDD[(VertexI
d
, (String, String))] =
val users: RDD[(VertexI
D
, (String, String))] =
sc.parallelize(Array((3, ("rxin", "student")), (7, ("jgonzal", "postdoc")),
sc.parallelize(Array((3
L
, ("rxin", "student")), (7
L
, ("jgonzal", "postdoc")),
(5, ("franklin", "prof")), (2, ("istoica", "prof"))))
(5
L
, ("franklin", "prof")), (2
L
, ("istoica", "prof"))))
// Create an RDD for edges
// Create an RDD for edges
val relationships: RDD[Edge[String]] =
val relationships: RDD[Edge[String]] =
sc.parallelize(Array(Edge(3, 7, "collab"), Edge(5, 3, "advisor"),
sc.parallelize(Array(Edge(3
L
, 7
L
, "collab"),
Edge(5
L
, 3
L
, "advisor"),
Edge(2, 5, "colleague"), Edge(5, 7, "pi"))
Edge(2
L
, 5
L
, "colleague"), Edge(5
L
, 7
L
, "pi"))
)
// Define a default user in case there are relationship with missing user
// Define a default user in case there are relationship with missing user
val defaultUser = ("John Doe", "Missing")
val defaultUser = ("John Doe", "Missing")
// Build the initial Graph
// Build the initial Graph
...
@@ -200,7 +200,7 @@ graph.edges.filter(e => e.srcId > e.dstId).count
...
@@ -200,7 +200,7 @@ graph.edges.filter(e => e.srcId > e.dstId).count
> tuple. On the other hand, `graph.edges` returns an `EdgeRDD` containing `Edge[String]` objects.
> tuple. On the other hand, `graph.edges` returns an `EdgeRDD` containing `Edge[String]` objects.
> We could have also used the case class type constructor as in the following:
> We could have also used the case class type constructor as in the following:
> {% highlight scala %}
> {% highlight scala %}
graph.edges.filter { case Edge(src, dst, prop) => src
<
dst }.count
graph.edges.filter { case Edge(src, dst, prop) => src
>
dst }.count
{% endhighlight %}
{% endhighlight %}
In addition to the vertex and edge views of the property graph, GraphX also exposes a triplet view.
In addition to the vertex and edge views of the property graph, GraphX also exposes a triplet view.
...
@@ -234,7 +234,9 @@ triplet view of a graph to render a collection of strings describing relationshi
...
@@ -234,7 +234,9 @@ triplet view of a graph to render a collection of strings describing relationshi
val graph: Graph[(String, String), String] // Constructed from above
val graph: Graph[(String, String), String] // Constructed from above
// Use the triplets view to create an RDD of facts.
// Use the triplets view to create an RDD of facts.
val facts: RDD[String] =
val facts: RDD[String] =
graph.triplets.map(et => et.srcAttr._1 + " is the " + et.attr + " of " et.dstAttr)
graph.triplets.map(triplet =>
triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1)
facts.collect.foreach(println(_))
{% endhighlight %}
{% endhighlight %}
# Graph Operators
# Graph Operators
...
@@ -294,11 +296,12 @@ unnecessary properties. For example, given a graph with the out-degrees as the
...
@@ -294,11 +296,12 @@ unnecessary properties. For example, given a graph with the out-degrees as the
{% highlight scala %}
{% highlight scala %}
// Given a graph where the vertex property is the out-degree
// Given a graph where the vertex property is the out-degree
val inputGraph: Graph[Int, String]
val inputGraph: Graph[Int, String] =
graph.outerJoinVertices(graph.outDegrees)((vid, _, degOpt) => degOpt.getOrElse(0))
// Construct a graph where each edge contains the weight
// Construct a graph where each edge contains the weight
// and each vertex is the initial PageRank
// and each vertex is the initial PageRank
val outputGraph: Graph[Double, Double] =
val outputGraph: Graph[Double, Double] =
inputGraph.mapTriplets(et => 1.0 / et.srcAttr).mapVertices(
v
=> 1.0)
inputGraph.mapTriplets(
tripl
et => 1.0 /
tripl
et.srcAttr).mapVertices(
(id, _)
=> 1.0)
{% endhighlight %}
{% endhighlight %}
## Structural Operators
## Structural Operators
...
@@ -338,7 +341,7 @@ val defaultUser = ("John Doe", "Missing")
...
@@ -338,7 +341,7 @@ val defaultUser = ("John Doe", "Missing")
// Build the initial Graph
// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)
val graph = Graph(users, relationships, defaultUser)
// Remove missing vertices as well as the edges to connected to them
// Remove missing vertices as well as the edges to connected to them
val validGraph = graph.subgraph((id, attr) => attr._2 != "Missing")
val validGraph = graph.subgraph(
vpred =
(id, attr) => attr._2 != "Missing")
{% endhighlight %}
{% endhighlight %}
> Note in the above example only the vertex predicate is provided. The `subgraph` operator defaults
> Note in the above example only the vertex predicate is provided. The `subgraph` operator defaults
...
@@ -356,7 +359,7 @@ the answer to the valid subgraph.
...
@@ -356,7 +359,7 @@ the answer to the valid subgraph.
// Run Connected Components
// Run Connected Components
val ccGraph = graph.connectedComponents() // No longer contains missing field
val ccGraph = graph.connectedComponents() // No longer contains missing field
// Remove missing vertices as well as the edges to connected to them
// Remove missing vertices as well as the edges to connected to them
val validGraph = graph.subgraph((id, attr) => attr._2 != "Missing")
val validGraph = graph.subgraph(
vpred =
(id, attr) => attr._2 != "Missing")
// Restrict the answer to the valid subgraph
// Restrict the answer to the valid subgraph
val validCCGraph = ccGraph.mask(validGraph)
val validCCGraph = ccGraph.mask(validGraph)
{% endhighlight %}
{% endhighlight %}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment