Skip to content
Snippets Groups Projects
Commit e6fb6ced authored by Kousuke Saruta's avatar Kousuke Saruta Committed by Sean Owen
Browse files

[STREAMING] [DOC] Remove duplicated description about WAL

I noticed there is a duplicated description about WAL.

```
To ensure zero-data loss, you have to additionally enable Write Ahead Logs in Spark Streaming. To ensure zero data loss, enable the Write Ahead Logs (introduced in Spark 1.2).
```

Let's remove the duplication.

I don't file this issue in JIRA because it's minor.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #6719 from sarutak/remove-multiple-description and squashes the following commits:

cc9bb21 [Kousuke Saruta] Removed duplicated description about WAL
parent 1b499993
No related branches found
No related tags found
No related merge requests found
......@@ -7,7 +7,7 @@ title: Spark Streaming + Kafka Integration Guide
## Approach 1: Receiver-based Approach
This approach uses a Receiver to receive the data. The Received is implemented using the Kafka high-level consumer API. As with all receivers, the data received from Kafka through a Receiver is stored in Spark executors, and then jobs launched by Spark Streaming processes the data.
However, under default configuration, this approach can lose data under failures (see [receiver reliability](streaming-programming-guide.html#receiver-reliability). To ensure zero-data loss, you have to additionally enable Write Ahead Logs in Spark Streaming. To ensure zero data loss, enable the Write Ahead Logs (introduced in Spark 1.2). This synchronously saves all the received Kafka data into write ahead logs on a distributed file system (e.g HDFS), so that all the data can be recovered on failure. See [Deploying section](streaming-programming-guide.html#deploying-applications) in the streaming programming guide for more details on Write Ahead Logs.
However, under default configuration, this approach can lose data under failures (see [receiver reliability](streaming-programming-guide.html#receiver-reliability). To ensure zero-data loss, you have to additionally enable Write Ahead Logs in Spark Streaming (introduced in Spark 1.2). This synchronously saves all the received Kafka data into write ahead logs on a distributed file system (e.g HDFS), so that all the data can be recovered on failure. See [Deploying section](streaming-programming-guide.html#deploying-applications) in the streaming programming guide for more details on Write Ahead Logs.
Next, we discuss how to use this approach in your streaming application.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment