Skip to content
Snippets Groups Projects
Commit 9d8e838d authored by Stephen Hopper's avatar Stephen Hopper Committed by Sean Owen
Browse files

[DOC] Added R to the list of languages with "high-level API" support in the…

… main README.

Author: Stephen Hopper <shopper@shopper-osx.local>

Closes #8646 from enragedginger/master.
parent 5ffe752b
No related branches found
No related tags found
No related merge requests found
# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
high-level APIs in Scala, Java, and Python, and an optimized engine that
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
......@@ -94,5 +94,5 @@ distribution.
## Configuration
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
......@@ -126,7 +126,7 @@ scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (w
wordCounts: spark.RDD[(String, Int)] = spark.ShuffledAggregatedRDD@71f027b8
{% endhighlight %}
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations) and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
{% highlight scala %}
scala> wordCounts.collect()
......@@ -163,7 +163,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i
>>> wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
{% endhighlight %}
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations) and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (string, int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (string, int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
{% highlight python %}
>>> wordCounts.collect()
......@@ -217,13 +217,13 @@ a cluster, as described in the [programming guide](programming-guide.html#initia
</div>
# Self-Contained Applications
Now say we wanted to write a self-contained application using the Spark API. We will walk through a
simple application in both Scala (with SBT), Java (with Maven), and Python.
Suppose we wish to write a self-contained application using the Spark API. We will walk through a
simple application in Scala (with sbt), Java (with Maven), and Python.
<div class="codetabs">
<div data-lang="scala" markdown="1">
We'll create a very simple Spark application in Scala. So simple, in fact, that it's
We'll create a very simple Spark application in Scala--so simple, in fact, that it's
named `SimpleApp.scala`:
{% highlight scala %}
......@@ -259,7 +259,7 @@ object which contains information about our
application.
Our application depends on the Spark API, so we'll also include an sbt configuration file,
`simple.sbt` which explains that Spark is a dependency. This file also adds a repository that
`simple.sbt`, which explains that Spark is a dependency. This file also adds a repository that
Spark depends on:
{% highlight scala %}
......@@ -302,7 +302,7 @@ Lines with a: 46, Lines with b: 23
</div>
<div data-lang="java" markdown="1">
This example will use Maven to compile an application jar, but any similar build system will work.
This example will use Maven to compile an application JAR, but any similar build system will work.
We'll create a very simple Spark application, `SimpleApp.java`:
......@@ -374,7 +374,7 @@ $ find .
Now, we can package the application using Maven and execute it with `./bin/spark-submit`.
{% highlight bash %}
# Package a jar containing your application
# Package a JAR containing your application
$ mvn package
...
[INFO] Building jar: {..}/{..}/target/simple-project-1.0.jar
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment