Skip to content
Snippets Groups Projects
  • Matei Zaharia's avatar
    4d74f060
    [SPARK-5608] Improve SEO of Spark documentation pages · 4d74f060
    Matei Zaharia authored
    - Add meta description tags on some of the most important doc pages
    - Shorten the titles of some pages to have more relevant keywords; for
      example there's no reason to have "Spark SQL Programming Guide - Spark
      1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
      documentation".
    
    Author: Matei Zaharia <matei@databricks.com>
    
    Closes #4381 from mateiz/docs-seo and squashes the following commits:
    
    4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
    4d74f060
    History
    [SPARK-5608] Improve SEO of Spark documentation pages
    Matei Zaharia authored
    - Add meta description tags on some of the most important doc pages
    - Shorten the titles of some pages to have more relevant keywords; for
      example there's no reason to have "Spark SQL Programming Guide - Spark
      1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
      documentation".
    
    Author: Matei Zaharia <matei@databricks.com>
    
    Closes #4381 from mateiz/docs-seo and squashes the following commits:
    
    4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
bagel-programming-guide.md 6.61 KiB
layout: global
displayTitle: Bagel Programming Guide
title: Bagel

Bagel will soon be superseded by GraphX; we recommend that new users try GraphX instead.

Bagel is a Spark implementation of Google's Pregel graph processing framework. Bagel currently supports basic graph computation, combiners, and aggregators.

In the Pregel programming model, jobs run as a sequence of iterations called supersteps. In each superstep, each vertex in the graph runs a user-specified function that can update state associated with the vertex and send messages to other vertices for use in the next iteration.

This guide shows the programming model and features of Bagel by walking through an example implementation of PageRank on Bagel.

Linking with Bagel

To use Bagel in your program, add the following SBT or Maven dependency:

groupId = org.apache.spark
artifactId = spark-bagel_{{site.SCALA_BINARY_VERSION}}
version = {{site.SPARK_VERSION}}

Programming Model

Bagel operates on a graph represented as a distributed dataset of (K, V) pairs, where keys are vertex IDs and values are vertices plus their associated state. In each superstep, Bagel runs a user-specified compute function on each vertex that takes as input the current vertex state and a list of messages sent to that vertex during the previous superstep, and returns the new vertex state and a list of outgoing messages.