Skip to content
Snippets Groups Projects
Commit b2970d97 authored by Lee Dongjin's avatar Lee Dongjin Committed by Sean Owen
Browse files

[MINOR][DOCS] Fix spacings in Structured Streaming Programming Guide


## What changes were proposed in this pull request?

1. Omitted space between the sentences: `... on static data.The Spark SQL engine will ...` -> `... on static data. The Spark SQL engine will ...`
2. Omitted colon in Output Model section.

## How was this patch tested?

None.

Author: Lee Dongjin <dongjin@apache.org>

Closes #17564 from dongjinleekr/feature/fix-programming-guide.

(cherry picked from commit b9384382)
Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
parent 46e212d2
No related branches found
No related tags found
No related merge requests found
......@@ -8,7 +8,7 @@ title: Structured Streaming Programming Guide
{:toc}
# Overview
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data.The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. You can use the [Dataset/DataFrame API](sql-programming-guide.html) in Scala, Java or Python to express streaming aggregations, event-time windows, stream-to-batch joins, etc. The computation is executed on the same optimized Spark SQL engine. Finally, the system ensures end-to-end exactly-once fault-tolerance guarantees through checkpointing and Write Ahead Logs. In short, *Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.*
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data. The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. You can use the [Dataset/DataFrame API](sql-programming-guide.html) in Scala, Java or Python to express streaming aggregations, event-time windows, stream-to-batch joins, etc. The computation is executed on the same optimized Spark SQL engine. Finally, the system ensures end-to-end exactly-once fault-tolerance guarantees through checkpointing and Write Ahead Logs. In short, *Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.*
**Structured Streaming is still ALPHA in Spark 2.1** and the APIs are still experimental. In this guide, we are going to walk you through the programming model and the APIs. First, let's start with a simple example - a streaming word count.
......@@ -368,7 +368,7 @@ A query on the input will generate the "Result Table". Every trigger interval (s
![Model](img/structured-streaming-model.png)
The "Output" is defined as what gets written out to the external storage. The output can be defined in different modes
The "Output" is defined as what gets written out to the external storage. The output can be defined in a different mode:
- *Complete Mode* - The entire updated Result Table will be written to the external storage. It is up to the storage connector to decide how to handle writing of the entire table.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment