Skip to content
Snippets Groups Projects
Commit f1246cc7 authored by Matei Zaharia's avatar Matei Zaharia
Browse files

Various enhancements to the programming guide and HTML/CSS

parent 051785c7
No related branches found
No related tags found
No related merge requests found
pygments: true
markdown: kramdown
\ No newline at end of file
......@@ -21,6 +21,7 @@
<link rel="stylesheet" href="{{HOME_PATH}}css/main.css">
<script src="{{HOME_PATH}}js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
<link rel="stylesheet" href="{{HOME_PATH}}css/pygments-default.css">
</head>
<body>
......@@ -30,7 +31,7 @@
<!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
<div class="navbar navbar-fixed-top">
<div class="navbar navbar-fixed-top" id="topbar">
<div class="navbar-inner">
<div class="container">
<a class="brand" href="{{HOME_PATH}}index.html"></a>
......@@ -109,12 +110,28 @@
</div> <!-- /container -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-1.8.0.min.js"><\/script>')</script>
<script src="js/vendor/jquery-1.8.0.min.js"></script>
<script src="js/vendor/bootstrap.min.js"></script>
<script src="js/main.js"></script>
<!-- A script to fix internal hash links because we have an overlapping top bar.
Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
<script>
$(function() {
function maybeScrollToHash() {
if (window.location.hash && $(window.location.hash).length) {
var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
$(window).scrollTop(newTop);
}
}
$(window).bind('hashchange', function() {
maybeScrollToHash();
});
// Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
// will try to do *their* initial scroll after running the onReady handler.
setTimeout(function() { maybeScrollToHash(); }, 1)
})
</script>
</body>
</html>
......@@ -11,7 +11,7 @@
}
.navbar-inner {
margin-top: 2px;
padding-top: 2px;
height: 50px;
}
......
docs/img/spark-logo-77x50px-hd.png

3.45 KiB

......@@ -7,7 +7,11 @@ title: Spark Overview
TODO(andyk): Rewrite to make the Java API a first class part of the story.
{% endcomment %}
Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode").
Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an
interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can
run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager,
[Hadoop YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html),
Amazon EC2, or without an independent resource manager ("standalone mode").
# Downloading
......@@ -33,7 +37,7 @@ For example, `./run spark.examples.SparkPi` will run a sample program that estim
examples prints usage help if no params are given.
Note that all of the sample programs take a `<master>` parameter specifying the cluster URL
to connect to. This can be a [URL for a distributed cluster]({{HOME_PATH}}scala-programming-guide.html#master_urls),
to connect to. This can be a [URL for a distributed cluster]({{HOME_PATH}}scala-programming-guide.html#master-urls),
or `local` to run locally with one thread, or `local[N]` to run locally with N threads. You should start by using
`local` for testing.
......
......@@ -6,7 +6,7 @@ title: Java Programming Guide
The Spark Java API
([spark.api.java]({{HOME_PATH}}api/core/index.html#spark.api.java.package)) defines
[`JavaSparkContext`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaSparkContext) and
[`JavaRDD`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaRDD) clases,
[`JavaRDD`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaRDD) classes,
which support
the same methods as their Scala counterparts but take Java functions and return
Java data and collection types.
......@@ -117,7 +117,7 @@ JavaRDD<String> words = lines.flatMap(new Split());
Continuing with the word count example, we map each word to a `(word, 1)` pair:
{% highlight java %}
import scala.Tuple2;
import scala.Tuple2;
JavaPairRDD<String, Integer> ones = words.map(
new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) {
......
......@@ -3,16 +3,16 @@ layout: global
title: Launching Spark on YARN
---
Spark allows you to launch jobs on an existing [YARN](http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster.
Spark allows you to launch jobs on an existing [YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster.
## Preparations
# Preparations
- In order to distribute Spark within the cluster it must be packaged into a single JAR file. This can be done by running `sbt/sbt assembly`
- Your application code must be packaged into a separate jar file.
If you want to test out the YARN deployment mode, you can use the current spark examples. A `spark-examples_2.9.1-0.6.0-SNAPSHOT.jar` file can be generated by running `sbt/sbt package`.
## Launching Spark on YARN
# Launching Spark on YARN
The command to launch the YARN Client is as follows:
......@@ -36,7 +36,7 @@ For example:
The above starts a YARN Client programs which periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running.
## Important Notes
# Important Notes
- When your application instantiates a Spark context it must use a special "standalone" master url. This starts the scheduler without forcing it to connect to a cluster. A good way to handle this is to pass "standalone" as an argument to your program, as shown in the example above.
- YARN does not support requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment