Skip to content
Snippets Groups Projects
Commit 706d6c15 authored by CodingCat's avatar CodingCat Committed by Shixiong Zhu
Browse files

[SPARK-19499][SS] Add more notes in the comments of Sink.addBatch()


## What changes were proposed in this pull request?

addBatch method in Sink trait is supposed to be a synchronous method to coordinate with the fault-tolerance design in StreamingExecution (being different with the compute() method in DStream)

We need to add more notes in the comments of this method to remind the developers

## How was this patch tested?

existing tests

Author: CodingCat <zhunansjtu@gmail.com>

Closes #16840 from CodingCat/SPARK-19499.

(cherry picked from commit d4cd9757)
Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
parent e642a07d
No related branches found
No related tags found
No related merge requests found
......@@ -31,8 +31,11 @@ trait Sink {
* this method is called more than once with the same batchId (which will happen in the case of
* failures), then `data` should only be added once.
*
* Note: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
* Note 1: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`).
* Otherwise, you may get a wrong result.
*
* Note 2: The method is supposed to be executed synchronously, i.e. the method should only return
* after data is consumed by sink successfully.
*/
def addBatch(batchId: Long, data: DataFrame): Unit
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment