Snippets Groups Projects

10 years ago

[SPARK-5608] Improve SEO of Spark documentation pages · 4d74f060

Matei Zaharia authored 10 years ago

- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
  example there's no reason to have "Spark SQL Programming Guide - Spark
  1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
  documentation".

Author: Matei Zaharia <matei@databricks.com>

Closes #4381 from mateiz/docs-seo and squashes the following commits:

4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages

4d74f060

[SPARK-5608] Improve SEO of Spark documentation pages

Matei Zaharia authored 10 years ago

- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
  example there's no reason to have "Spark SQL Programming Guide - Spark
  1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
  documentation".

Author: Matei Zaharia <matei@databricks.com>

Closes #4381 from mateiz/docs-seo and squashes the following commits:

4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages

bagel-programming-guide.md 6.61 KiB

layout: global
displayTitle: Bagel Programming Guide
title: Bagel

Bagel will soon be superseded by GraphX; we recommend that new users try GraphX instead.

Bagel is a Spark implementation of Google's Pregel graph processing framework. Bagel currently supports basic graph computation, combiners, and aggregators.

In the Pregel programming model, jobs run as a sequence of iterations called supersteps. In each superstep, each vertex in the graph runs a user-specified function that can update state associated with the vertex and send messages to other vertices for use in the next iteration.

This guide shows the programming model and features of Bagel by walking through an example implementation of PageRank on Bagel.

Linking with Bagel

To use Bagel in your program, add the following SBT or Maven dependency:

groupId = org.apache.spark
artifactId = spark-bagel_{{site.SCALA_BINARY_VERSION}}
version = {{site.SPARK_VERSION}}

Programming Model

Bagel operates on a graph represented as a distributed dataset of (K, V) pairs, where keys are vertex IDs and values are vertices plus their associated state. In each superstep, Bagel runs a user-specified compute function on each vertex that takes as input the current vertex state and a list of messages sent to that vertex during the previous superstep, and returns the new vertex state and a list of outgoing messages.