Skip to content
Snippets Groups Projects
Commit 1ac7bf89 authored by Matei Zaharia's avatar Matei Zaharia
Browse files

Merge pull request #265 from pwendell/docs-version-config

Making Spark version configurable in docs and updating Bagel doc
parents ee2fcb2c 5788a929
No related branches found
No related tags found
No related merge requests found
......@@ -3,6 +3,6 @@ markdown: kramdown
# These allow the documentation to be updated with nerw releases
# of Spark, Scala, and Mesos.
SPARK_VERSION: 0.6.0
SPARK_VERSION: 0.6.0-SNAPSHOT
SCALA_VERSION: 2.9.2
MESOS_VERSION: 0.9.0-incubating
......@@ -14,8 +14,9 @@ This guide shows the programming model and features of Bagel by walking through
To write a Bagel application, you will need to add Spark, its dependencies, and Bagel to your CLASSPATH:
1. Run `sbt/sbt update` to fetch Spark's dependencies, if you haven't already done so.
2. Run `sbt/sbt assembly` to build Spark and its dependencies into one JAR (`core/target/scala_2.8.1/Spark Core-assembly-0.3-SNAPSHOT.jar`) and Bagel into a second JAR (`bagel/target/scala_2.8.1/Bagel-assembly-0.3-SNAPSHOT.jar`).
3. Add these two JARs to your CLASSPATH.
2. Run `sbt/sbt assembly` to build Spark and its dependencies into one JAR (`core/target/spark-core-assembly-{{site.SPARK_VERSION}}.jar`)
3. Run `sbt/sbt package` build the Bagel JAR (`bagel/target/scala_{{site.SCALA_VERSION}}/spark-bagel_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}.jar`).
4. Add these two JARs to your CLASSPATH.
## Programming Model
......
......@@ -101,13 +101,9 @@ res9: Long = 15
It may seem silly to use a Spark to explore and cache a 30-line text file. The interesting part is that these same functions can be used on very large data sets, even when they are striped across tens or hundreds of nodes. You can also do this interactively by connecting `spark-shell` to a cluster, as described in the [programming guide](scala-programming-guide.html#initializing-spark).
# A Standalone Job in Scala
Now say we wanted to write a standalone job using the Spark API. We will walk through a simple job in both Scala (with sbt) and Java (with maven). If you using other build systems, please reference the Spark assembly JAR in the developer guide. The first step is to publish Spark to our local Ivy/Maven repositories. From the Spark directory:
Now say we wanted to write a standalone job using the Spark API. We will walk through a simple job in both Scala (with sbt) and Java (with maven). If you using other build systems, consider using the Spark assembly JAR described in the developer guide.
{% highlight bash %}
$ sbt/sbt publish-local
{% endhighlight %}
Next, we'll create a very simple Spark job in Scala. So simple, in fact, that it's named `SimpleJob.scala`:
We'll create a very simple Spark job in Scala. So simple, in fact, that it's named `SimpleJob.scala`:
{% highlight scala %}
/*** SimpleJob.scala ***/
......@@ -159,12 +155,9 @@ Lines with a: 8422, Lines with b: 1836
This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS.
# A Standalone Job In Java
Now say we wanted to write a standalone job using the Java API. We will walk through doing this with Maven. If you using other build systems, please reference the Spark assembly JAR in the developer guide. The first step is to publish Spark to our local Ivy/Maven repositories. From the Spark directory:
Now say we wanted to write a standalone job using the Java API. We will walk through doing this with Maven. If you using other build systems, consider using the Spark assembly JAR described in the developer guide.
{% highlight bash %}
$ sbt/sbt publish-local
{% endhighlight %}
Next, we'll create a very simple Spark job, `SimpleJob.java`:
We'll create a very simple Spark job, `SimpleJob.java`:
{% highlight java %}
/*** SimpleJob.java ***/
......
......@@ -19,7 +19,7 @@ branch of Spark, called `yarn`, which you can do as follows:
- In order to distribute Spark within the cluster, it must be packaged into a single JAR file. This can be done by running `sbt/sbt assembly`
- Your application code must be packaged into a separate JAR file.
If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}-SNAPSHOT.jar` file can be generated by running `sbt/sbt package`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different.
If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}` file can be generated by running `sbt/sbt package`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different.
# Launching Spark on YARN
......@@ -35,8 +35,8 @@ The command to launch the YARN Client is as follows:
For example:
SPARK_JAR=./core/target/spark-core-assembly-{{site.SPARK_VERSION}}-SNAPSHOT.jar ./run spark.deploy.yarn.Client \
--jar examples/target/scala-{{site.SCALA_VERSION}}/spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}-SNAPSHOT.jar \
SPARK_JAR=./core/target/spark-core-assembly-{{site.SPARK_VERSION}}.jar ./run spark.deploy.yarn.Client \
--jar examples/target/scala-{{site.SCALA_VERSION}}/spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}.jar \
--class spark.examples.SparkPi \
--args standalone \
--num-workers 3 \
......
......@@ -17,7 +17,13 @@ This guide shows each of these features and walks through some samples. It assum
# Linking with Spark
To write a Spark application, you will need to add both Spark and its dependencies to your CLASSPATH. The easiest way to do this is to run `sbt/sbt assembly` to build both Spark and its dependencies into one JAR (`core/target/spark-core-assembly-0.6.0.jar`), then add this to your CLASSPATH. Alternatively, you can publish Spark to the Maven cache on your machine using `sbt/sbt publish-local`. It will be an artifact called `spark-core` under the organization `org.spark-project`.
To write a Spark application, you will need to add both Spark and its dependencies to your CLASSPATH. If you use sbt or Maven, Spark is available through Maven Central at:
groupId = org.spark_project
artifactId = spark-core_{{site.SCALA_VERSION}}
version = {{site.SPARK_VERSION}}
For other build systems or environments, you can run `sbt/sbt assembly` to build both Spark and its dependencies into one JAR (`core/target/spark-core-assembly-0.6.0.jar`), then add this to your CLASSPATH.
In addition, you'll need to import some Spark classes and implicit conversions. Add the following lines at the top of your program:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment