Skip to content
Snippets Groups Projects
Commit b6f2017a authored by Wil Selwood's avatar Wil Selwood Committed by Sean Owen
Browse files

[MINOR] document edge case of updateFunc usage

## What changes were proposed in this pull request?

Include documentation of the fact that the updateFunc is sometimes called with no new values. This is documented in the main documentation here: https://spark.apache.org/docs/latest/streaming-programming-guide.html#updatestatebykey-operation however from the docs included with the code it is not clear that this is the case.

## How was this patch tested?

PR only changes comments. Confirmed code still builds.

Author: Wil Selwood <wil.selwood@sa.catapult.org.uk>

Closes #18088 from wselwood/note-edge-case-in-docs.
parent d9ad7890
No related branches found
No related tags found
No related merge requests found
......@@ -389,6 +389,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* Hash partitioning is used to generate the RDDs with Spark's default number of partitions.
* @param updateFunc State update function. If `this` function returns None, then
* corresponding state key-value pair will be eliminated.
......@@ -403,6 +404,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* Hash partitioning is used to generate the RDDs with `numPartitions` partitions.
* @param updateFunc State update function. If `this` function returns None, then
* corresponding state key-value pair will be eliminated.
......@@ -419,6 +421,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of the key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* [[org.apache.spark.Partitioner]] is used to control the partitioning of each RDD.
* @param updateFunc State update function. If `this` function returns None, then
* corresponding state key-value pair will be eliminated.
......@@ -440,6 +443,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* [[org.apache.spark.Partitioner]] is used to control the partitioning of each RDD.
* @param updateFunc State update function. Note, that this function may generate a different
* tuple with a different key than the input key. Therefore keys may be removed
......@@ -464,6 +468,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of the key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* org.apache.spark.Partitioner is used to control the partitioning of each RDD.
* @param updateFunc State update function. If `this` function returns None, then
* corresponding state key-value pair will be eliminated.
......@@ -487,6 +492,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* org.apache.spark.Partitioner is used to control the partitioning of each RDD.
* @param updateFunc State update function. Note, that this function may generate a different
* tuple with a different key than the input key. Therefore keys may be removed
......@@ -513,6 +519,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
/**
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of the key.
* In every batch the updateFunc will be called for each state even if there are no new values.
* org.apache.spark.Partitioner is used to control the partitioning of each RDD.
* @param updateFunc State update function. If `this` function returns None, then
* corresponding state key-value pair will be eliminated.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment