Skip to content
Snippets Groups Projects
Commit 2b53447f authored by Aaron Kimball's avatar Aaron Kimball Committed by Reynold Xin
Browse files

SPARK-1173. Improve scala streaming docs.

Clarify imports to add implicit conversions to DStream and
fix other small typos in the streaming intro documentation.

Tested by inspecting output via a local jekyll server, c&p'ing the scala commands into a spark terminal.

Author: Aaron Kimball <aaron@magnify.io>

Closes #64 from kimballa/spark-1173-streaming-docs and squashes the following commits:

6fbff0e [Aaron Kimball] SPARK-1173. Improve scala streaming docs.
parent 55a4f11b
No related branches found
No related tags found
No related merge requests found
......@@ -58,11 +58,21 @@ do is as follows.
<div class="codetabs">
<div data-lang="scala" markdown="1" >
First, we import the names of the Spark Streaming classes, and some implicit
conversions from StreamingContext into our environment, to add useful methods to
other classes we need (like DStream).
First, we create a
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object,
which is the main entry point for all streaming
functionality. Besides Spark's configuration, we specify that any DStream will be processed
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
main entry point for all streaming functionality.
{% highlight scala %}
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
{% endhighlight %}
Then we create a
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
Besides Spark's configuration, we specify that any DStream will be processed
in 1 second batches.
{% highlight scala %}
......@@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print a few of the counts to the console
wordCount.print()
wordCounts.print()
{% endhighlight %}
The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
......@@ -262,6 +272,24 @@ Time: 1357008430000 ms
</td>
</table>
If you plan to run the Scala code for Spark Streaming-based use cases in the Spark
shell, you should start the shell with the SparkConfiguration pre-configured to
discard old batches periodically:
{% highlight bash %}
$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell
{% endhighlight %}
... and create your StreamingContext by wrapping the existing interactive shell
SparkContext object, `sc`:
{% highlight scala %}
val ssc = new StreamingContext(sc, Seconds(1))
{% endhighlight %}
When working with the shell, you may also need to send a `^D` to your netcat session
to force the pipeline to print the word counts to the console at the sink.
***************************************************************************************************
# Basics
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment