Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
spark
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
cs525-sp18-g07
spark
Commits
7bda9977
Commit
7bda9977
authored
11 years ago
by
Ankur Dave
Browse files
Options
Downloads
Patches
Plain Diff
Improve docs for PartitionStrategy
parent
eb4b46f8
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
+31
-27
31 additions, 27 deletions
...ain/scala/org/apache/spark/graphx/PartitionStrategy.scala
with
31 additions
and
27 deletions
graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
+
31
−
27
View file @
7bda9977
package
org.apache.spark.graphx
/**
* Represents the way edges are assigned to edge partitions based on their source and destination
* vertex IDs.
*/
sealed
trait
PartitionStrategy
extends
Serializable
{
def
getPartition
(
src
:
VertexID
,
dst
:
VertexID
,
numParts
:
PartitionID
)
:
PartitionID
}
/**
* This function implements a classic 2D-Partitioning of a sparse matrix.
* Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix,
* guaranteeing a `2 * sqrt(numParts)` bound on vertex replication.
*
* Suppose we have a graph with 11 vertices that we want to partition
* over 9 machines. We can use the following sparse matrix representation:
*
* <pre>
* __________________________________
* v0 | P0 * | P1 | P2 * |
* v1 | **** | * | |
...
...
@@ -27,28 +33,23 @@ sealed trait PartitionStrategy extends Serializable {
* v10 | * | ** | * * |
* v11 | * <-E | *** | ** |
* ----------------------------------
* </pre>
*
* The edge denoted by E connects v11 with v1 and is assigned to
* processor P6. To get the processor number we divide the matrix
* into sqrt(numProc) by sqrt(numProc) blocks. Notice that edges
* adjacent to v11 can only be in the first colum of
* blocks (P0, P3, P6) or the last row of blocks (P6, P7, P8).
* As a consequence we can guarantee that v11 will need to be
* replicated to at most 2 * sqrt(numProc) machines.
*
* Notice that P0 has many edges and as a consequence this
* partitioning would lead to poor work balance. To improve
* balance we first multiply each vertex id by a large prime
* to effectively shuffle the vertex locations.
*
* One of the limitations of this approach is that the number of
* machines must either be a perfect square. We partially address
* this limitation by computing the machine assignment to the next
* largest perfect square and then mapping back down to the actual
* number of machines. Unfortunately, this can also lead to work
* imbalance and so it is suggested that a perfect square is used.
* The edge denoted by `E` connects `v11` with `v1` and is assigned to processor `P6`. To get the
* processor number we divide the matrix into `sqrt(numParts)` by `sqrt(numParts)` blocks. Notice
* that edges adjacent to `v11` can only be in the first column of blocks `(P0, P3, P6)` or the last
* row of blocks `(P6, P7, P8)`. As a consequence we can guarantee that `v11` will need to be
* replicated to at most `2 * sqrt(numParts)` machines.
*
* Notice that `P0` has many edges and as a consequence this partitioning would lead to poor work
* balance. To improve balance we first multiply each vertex id by a large prime to shuffle the
* vertex locations.
*
* One of the limitations of this approach is that the number of machines must either be a perfect
* square. We partially address this limitation by computing the machine assignment to the next
* largest perfect square and then mapping back down to the actual number of machines.
* Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect square
* is used.
*/
case
object
EdgePartition2D
extends
PartitionStrategy
{
override
def
getPartition
(
src
:
VertexID
,
dst
:
VertexID
,
numParts
:
PartitionID
)
:
PartitionID
=
{
...
...
@@ -60,7 +61,10 @@ case object EdgePartition2D extends PartitionStrategy {
}
}
/**
* Assigns edges to partitions using only the source vertex ID, colocating edges with the same
* source.
*/
case
object
EdgePartition1D
extends
PartitionStrategy
{
override
def
getPartition
(
src
:
VertexID
,
dst
:
VertexID
,
numParts
:
PartitionID
)
:
PartitionID
=
{
val
mixingPrime
:
VertexID
=
1125899906842597L
...
...
@@ -70,8 +74,8 @@ case object EdgePartition1D extends PartitionStrategy {
/**
* Assign edges to
an aribtrary machine correspond
ing
to
a
* random vertex cut.
* Assign
s
edges to
partitions by hashing the source and destination vertex IDs, result
ing
in
a
* random vertex cut
that colocates all same-direction edges between two vertices
.
*/
case
object
RandomVertexCut
extends
PartitionStrategy
{
override
def
getPartition
(
src
:
VertexID
,
dst
:
VertexID
,
numParts
:
PartitionID
)
:
PartitionID
=
{
...
...
@@ -81,9 +85,9 @@ case object RandomVertexCut extends PartitionStrategy {
/**
* Assign edges to
an arbitrary machine corresponding to a random vertex cut. This
*
fun
ction
ensures that edges of opposite direction
between
the same
two vertices
*
will end up on the same parti
tion.
* Assign
s
edges to
partitions by hashing the source and destination vertex IDs in a canonical
*
dire
ction
, resulting in a random vertex cut that colocates all edges
between two vertices
,
*
regardless of direc
tion.
*/
case
object
CanonicalRandomVertexCut
extends
PartitionStrategy
{
override
def
getPartition
(
src
:
VertexID
,
dst
:
VertexID
,
numParts
:
PartitionID
)
:
PartitionID
=
{
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment