Skip to content
Snippets Groups Projects
Commit b1ebe182 authored by Bryan Cutler's avatar Bryan Cutler Committed by Sean Owen
Browse files

[SPARK-16932][DOCS] Changed programming guide to not reference old accumulator API in Scala

## What changes were proposed in this pull request?

In the programming guide, the accumulator section mixes up both the old and new APIs causing it to be confusing.  This is not necessary for Scala, so all references to the old API are removed.  For Java, it is somewhat fixed up except for the example of a custom accumulator because I don't think an API exists yet.  Python has not currently implemented the new API.

## How was this patch tested?
built doc locally

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #14516 from BryanCutler/fixup-accumulator-programming-guide-SPARK-15702.
parent 7aaa5a01
No related branches found
No related tags found
No related merge requests found
......@@ -1348,17 +1348,17 @@ running stages (NOTE: this is not yet supported in Python).
<img src="img/spark-webui-accumulators.png" title="Accumulators in the Spark UI" alt="Accumulators in the Spark UI" />
</p>
An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks
running on a cluster can then add to it using the `add` method or the `+=` operator (in Scala and Python).
However, they cannot read its value.
Only the driver program can read the accumulator's value, using its `value` method.
The code below shows an accumulator being used to add up the elements of an array:
<div class="codetabs">
<div data-lang="scala" markdown="1">
A numeric accumulator can be created by calling `SparkContext.longAccumulator()` or `SparkContext.doubleAccumulator()`
to accumulate values of type Long or Double, respectively. Tasks running on a cluster can then add to it using
the `add` method. However, they cannot read its value. Only the driver program can read the accumulator's value,
using its `value` method.
The code below shows an accumulator being used to add up the elements of an array:
{% highlight scala %}
scala> val accum = sc.longAccumulator("My Accumulator")
accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(My Accumulator), value: 0)
......@@ -1395,14 +1395,21 @@ val myVectorAcc = new VectorAccumulatorV2
sc.register(myVectorAcc, "MyVectorAcc1")
{% endhighlight %}
Note that, when programmers define their own type of AccumulatorV2, the resulting type can be same or not same with the elements added.
Note that, when programmers define their own type of AccumulatorV2, the resulting type can be different than that of the elements added.
</div>
<div data-lang="java" markdown="1">
A numeric accumulator can be created by calling `SparkContext.longAccumulator()` or `SparkContext.doubleAccumulator()`
to accumulate values of type Long or Double, respectively. Tasks running on a cluster can then add to it using
the `add` method. However, they cannot read its value. Only the driver program can read the accumulator's value,
using its `value` method.
The code below shows an accumulator being used to add up the elements of an array:
{% highlight java %}
LongAccumulator accum = sc.sc().longAccumulator();
LongAccumulator accum = jsc.sc().longAccumulator();
sc.parallelize(Arrays.asList(1, 2, 3, 4)).foreach(x -> accum.add(x));
// ...
......@@ -1412,8 +1419,8 @@ accum.value();
// returns 10
{% endhighlight %}
While this code used the built-in support for accumulators of type Integer, programmers can also
create their own types by subclassing [AccumulatorParam](api/java/index.html?org/apache/spark/AccumulatorParam.html).
Programmers can also create their own types by subclassing
[AccumulatorParam](api/java/index.html?org/apache/spark/AccumulatorParam.html).
The AccumulatorParam interface has two methods: `zero` for providing a "zero value" for your data
type, and `addInPlace` for adding two values together. For example, supposing we had a `Vector` class
representing mathematical vectors, we could write:
......@@ -1440,6 +1447,12 @@ a list by collecting together elements).
<div data-lang="python" markdown="1">
An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks
running on a cluster can then add to it using the `add` method or the `+=` operator. However, they cannot read its value.
Only the driver program can read the accumulator's value, using its `value` method.
The code below shows an accumulator being used to add up the elements of an array:
{% highlight python %}
>>> accum = sc.accumulator(0)
Accumulator<id=0, value=0>
......@@ -1485,15 +1498,15 @@ Accumulators do not change the lazy evaluation model of Spark. If they are being
<div data-lang="scala" markdown="1">
{% highlight scala %}
val accum = sc.accumulator(0)
data.map { x => accum += x; x }
val accum = sc.longAccumulator
data.map { x => accum.add(x); x }
// Here, accum is still 0 because no actions have caused the map operation to be computed.
{% endhighlight %}
</div>
<div data-lang="java" markdown="1">
{% highlight java %}
LongAccumulator accum = sc.sc().longAccumulator();
LongAccumulator accum = jsc.sc().longAccumulator();
data.map(x -> { accum.add(x); return f(x); });
// Here, accum is still 0 because no actions have caused the `map` to be computed.
{% endhighlight %}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment