Skip to content
Snippets Groups Projects
Commit dfd9723d authored by Sandeep Singh's avatar Sandeep Singh Committed by Sean Owen
Browse files

[MINOR][DOCS] Fix type Information in Quick Start and Programming Guide

Author: Sandeep Singh <sandeep@techaddict.me>

Closes #12841 from techaddict/improve_docs_1.
parent f10ae4b1
No related branches found
No related tags found
No related merge requests found
......@@ -328,7 +328,7 @@ Text file RDDs can be created using `SparkContext`'s `textFile` method. This met
{% highlight scala %}
scala> val distFile = sc.textFile("data.txt")
distFile: RDD[String] = MappedRDD@1d4cee08
distFile: org.apache.spark.rdd.RDD[String] = data.txt MapPartitionsRDD[10] at textFile at <console>:26
{% endhighlight %}
Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `distFile.map(s => s.length).reduce((a, b) => a + b)`.
......
......@@ -33,7 +33,7 @@ Spark's primary abstraction is a distributed collection of items called a Resili
{% highlight scala %}
scala> val textFile = sc.textFile("README.md")
textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:25
{% endhighlight %}
RDDs have _[actions](programming-guide.html#actions)_, which return values, and _[transformations](programming-guide.html#transformations)_, which return pointers to new RDDs. Let's start with a few actions:
......@@ -50,7 +50,7 @@ Now let's use a transformation. We will use the [`filter`](programming-guide.htm
{% highlight scala %}
scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:27
{% endhighlight %}
We can chain together transformations and actions:
......@@ -123,7 +123,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i
{% highlight scala %}
scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCounts: spark.RDD[(String, Int)] = spark.ShuffledAggregatedRDD@71f027b8
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:28
{% endhighlight %}
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
......@@ -181,7 +181,7 @@ Spark also supports pulling data sets into a cluster-wide in-memory cache. This
{% highlight scala %}
scala> linesWithSpark.cache()
res7: spark.RDD[String] = spark.FilteredRDD@17e51082
res7: linesWithSpark.type = MapPartitionsRDD[2] at filter at <console>:27
scala> linesWithSpark.count()
res8: Long = 19
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment