Skip to content
Snippets Groups Projects
Commit 52b24a60 authored by Tathagata Das's avatar Tathagata Das
Browse files

[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about...

[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8656 from tdas/SPARK-10492 and squashes the following commits:

986cdd6 [Tathagata Das] Added information on backpressure
parent e6f8d368
No related branches found
No related tags found
No related merge requests found
......@@ -1433,6 +1433,19 @@ Apart from these, the following properties are also available, and may be useful
#### Spark Streaming
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.streaming.backpressure.enabled</code></td>
<td>false</td>
<td>
Enables or disables Spark Streaming's internal backpressure mechanism (since 1.5).
This enables the Spark Streaming to control the receiving rate based on the
current batch scheduling delays and processing times so that the system receives
only as fast as the system can process. Internally, this dynamically sets the
maximum receiving rate of receivers. This rate is upper bounded by the values
`spark.streaming.receiver.maxRate` and `spark.streaming.kafka.maxRatePerPartition`
if they are set (see below).
</td>
</tr>
<tr>
<td><code>spark.streaming.blockInterval</code></td>
<td>200ms</td>
......
......@@ -1807,7 +1807,7 @@ To run a Spark Streaming applications, you need to have the following.
+ *Mesos* - [Marathon](https://github.com/mesosphere/marathon) has been used to achieve this
with Mesos.
- *[Since Spark 1.2] Configuring write ahead logs* - Since Spark 1.2,
- *Configuring write ahead logs* - Since Spark 1.2,
we have introduced _write ahead logs_ for achieving strong
fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into
a write ahead log in the configuration checkpoint directory. This prevents data loss on driver
......@@ -1822,6 +1822,17 @@ To run a Spark Streaming applications, you need to have the following.
stored in a replicated storage system. This can be done by setting the storage level for the
input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
- *Setting the max receiving rate* - If the cluster resources is not large enough for the streaming
application to process data as fast as it is being received, the receivers can be rate limited
by setting a maximum rate limit in terms of records / sec.
See the [configuration parameters](configuration.html#spark-streaming)
`spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition`
for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that
eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
rate limits and dynamically adjusts them if the processing conditions change. This backpressure
can be enabled by setting the [configuration parameter](configuration.html#spark-streaming)
`spark.streaming.backpressure.enabled` to `true`.
### Upgrading Application Code
{:.no_toc}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment