Skip to content
Snippets Groups Projects
Commit f1246cc7 authored by Matei Zaharia's avatar Matei Zaharia
Browse files

Various enhancements to the programming guide and HTML/CSS

parent 051785c7
No related branches found
No related tags found
No related merge requests found
pygments: true pygments: true
markdown: kramdown
\ No newline at end of file
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
<link rel="stylesheet" href="{{HOME_PATH}}css/main.css"> <link rel="stylesheet" href="{{HOME_PATH}}css/main.css">
<script src="{{HOME_PATH}}js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script> <script src="{{HOME_PATH}}js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
<link rel="stylesheet" href="{{HOME_PATH}}css/pygments-default.css"> <link rel="stylesheet" href="{{HOME_PATH}}css/pygments-default.css">
</head> </head>
<body> <body>
...@@ -30,7 +31,7 @@ ...@@ -30,7 +31,7 @@
<!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html --> <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
<div class="navbar navbar-fixed-top"> <div class="navbar navbar-fixed-top" id="topbar">
<div class="navbar-inner"> <div class="navbar-inner">
<div class="container"> <div class="container">
<a class="brand" href="{{HOME_PATH}}index.html"></a> <a class="brand" href="{{HOME_PATH}}index.html"></a>
...@@ -109,12 +110,28 @@ ...@@ -109,12 +110,28 @@
</div> <!-- /container --> </div> <!-- /container -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script> <script src="js/vendor/jquery-1.8.0.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-1.8.0.min.js"><\/script>')</script>
<script src="js/vendor/bootstrap.min.js"></script> <script src="js/vendor/bootstrap.min.js"></script>
<script src="js/main.js"></script> <script src="js/main.js"></script>
<!-- A script to fix internal hash links because we have an overlapping top bar.
Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
<script>
$(function() {
function maybeScrollToHash() {
if (window.location.hash && $(window.location.hash).length) {
var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
$(window).scrollTop(newTop);
}
}
$(window).bind('hashchange', function() {
maybeScrollToHash();
});
// Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
// will try to do *their* initial scroll after running the onReady handler.
setTimeout(function() { maybeScrollToHash(); }, 1)
})
</script>
</body> </body>
</html> </html>
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
} }
.navbar-inner { .navbar-inner {
margin-top: 2px; padding-top: 2px;
height: 50px; height: 50px;
} }
......
docs/img/spark-logo-77x50px-hd.png

3.45 KiB

...@@ -7,7 +7,11 @@ title: Spark Overview ...@@ -7,7 +7,11 @@ title: Spark Overview
TODO(andyk): Rewrite to make the Java API a first class part of the story. TODO(andyk): Rewrite to make the Java API a first class part of the story.
{% endcomment %} {% endcomment %}
Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode"). Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an
interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can
run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager,
[Hadoop YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html),
Amazon EC2, or without an independent resource manager ("standalone mode").
# Downloading # Downloading
...@@ -33,7 +37,7 @@ For example, `./run spark.examples.SparkPi` will run a sample program that estim ...@@ -33,7 +37,7 @@ For example, `./run spark.examples.SparkPi` will run a sample program that estim
examples prints usage help if no params are given. examples prints usage help if no params are given.
Note that all of the sample programs take a `<master>` parameter specifying the cluster URL Note that all of the sample programs take a `<master>` parameter specifying the cluster URL
to connect to. This can be a [URL for a distributed cluster]({{HOME_PATH}}scala-programming-guide.html#master_urls), to connect to. This can be a [URL for a distributed cluster]({{HOME_PATH}}scala-programming-guide.html#master-urls),
or `local` to run locally with one thread, or `local[N]` to run locally with N threads. You should start by using or `local` to run locally with one thread, or `local[N]` to run locally with N threads. You should start by using
`local` for testing. `local` for testing.
......
...@@ -6,7 +6,7 @@ title: Java Programming Guide ...@@ -6,7 +6,7 @@ title: Java Programming Guide
The Spark Java API The Spark Java API
([spark.api.java]({{HOME_PATH}}api/core/index.html#spark.api.java.package)) defines ([spark.api.java]({{HOME_PATH}}api/core/index.html#spark.api.java.package)) defines
[`JavaSparkContext`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaSparkContext) and [`JavaSparkContext`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaSparkContext) and
[`JavaRDD`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaRDD) clases, [`JavaRDD`]({{HOME_PATH}}api/core/index.html#spark.api.java.JavaRDD) classes,
which support which support
the same methods as their Scala counterparts but take Java functions and return the same methods as their Scala counterparts but take Java functions and return
Java data and collection types. Java data and collection types.
...@@ -117,7 +117,7 @@ JavaRDD<String> words = lines.flatMap(new Split()); ...@@ -117,7 +117,7 @@ JavaRDD<String> words = lines.flatMap(new Split());
Continuing with the word count example, we map each word to a `(word, 1)` pair: Continuing with the word count example, we map each word to a `(word, 1)` pair:
{% highlight java %} {% highlight java %}
import scala.Tuple2; import scala.Tuple2;
JavaPairRDD<String, Integer> ones = words.map( JavaPairRDD<String, Integer> ones = words.map(
new PairFunction<String, String, Integer>() { new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) { public Tuple2<String, Integer> call(String s) {
......
...@@ -3,16 +3,16 @@ layout: global ...@@ -3,16 +3,16 @@ layout: global
title: Launching Spark on YARN title: Launching Spark on YARN
--- ---
Spark allows you to launch jobs on an existing [YARN](http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster. Spark allows you to launch jobs on an existing [YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster.
## Preparations # Preparations
- In order to distribute Spark within the cluster it must be packaged into a single JAR file. This can be done by running `sbt/sbt assembly` - In order to distribute Spark within the cluster it must be packaged into a single JAR file. This can be done by running `sbt/sbt assembly`
- Your application code must be packaged into a separate jar file. - Your application code must be packaged into a separate jar file.
If you want to test out the YARN deployment mode, you can use the current spark examples. A `spark-examples_2.9.1-0.6.0-SNAPSHOT.jar` file can be generated by running `sbt/sbt package`. If you want to test out the YARN deployment mode, you can use the current spark examples. A `spark-examples_2.9.1-0.6.0-SNAPSHOT.jar` file can be generated by running `sbt/sbt package`.
## Launching Spark on YARN # Launching Spark on YARN
The command to launch the YARN Client is as follows: The command to launch the YARN Client is as follows:
...@@ -36,7 +36,7 @@ For example: ...@@ -36,7 +36,7 @@ For example:
The above starts a YARN Client programs which periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running. The above starts a YARN Client programs which periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running.
## Important Notes # Important Notes
- When your application instantiates a Spark context it must use a special "standalone" master url. This starts the scheduler without forcing it to connect to a cluster. A good way to handle this is to pass "standalone" as an argument to your program, as shown in the example above. - When your application instantiates a Spark context it must use a special "standalone" master url. This starts the scheduler without forcing it to connect to a cluster. A good way to handle this is to pass "standalone" as an argument to your program, as shown in the example above.
- YARN does not support requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed. - YARN does not support requesting container resources based on the number of cores. Thus the numbers of cores given via command line arguments cannot be guaranteed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment