From 88181bee9e385361d0a079fbffa78124981799e0 Mon Sep 17 00:00:00 2001 From: Matei Zaharia <matei@eecs.berkeley.edu> Date: Wed, 12 Sep 2012 19:47:31 -0700 Subject: [PATCH] Small tweaks to generated doc pages --- docs/_layouts/global.html | 2 + docs/contributing-to-spark.md | 3 +- docs/css/main.css | 12 +++ docs/ec2-scripts.md | 146 ---------------------------------- docs/index.md | 3 +- docs/running-on-mesos.md | 1 - 6 files changed, 16 insertions(+), 151 deletions(-) delete mode 100644 docs/ec2-scripts.md diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 402adca72c..8bfd0e7284 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -64,6 +64,8 @@ </div> <div class="container"> + <h1 class="title">{{ page.title }}</h1> + {{ content }} <!-- Main hero unit for a primary marketing message or call to action --> <!--<div class="hero-unit"> diff --git a/docs/contributing-to-spark.md b/docs/contributing-to-spark.md index 3585bda2d3..a99ab41531 100644 --- a/docs/contributing-to-spark.md +++ b/docs/contributing-to-spark.md @@ -1,8 +1,7 @@ --- layout: global -title: How to Contribute to Spark +title: Contributing to Spark --- -# Contributing to Spark The Spark team welcomes contributions in the form of GitHub pull requests. Here are a few tips to get your contribution in: diff --git a/docs/css/main.css b/docs/css/main.css index 8432d0f911..cf56399376 100755 --- a/docs/css/main.css +++ b/docs/css/main.css @@ -15,10 +15,22 @@ body { line-height: 1.6; /* Inspired by Github's wiki style */ } +.title { + font-size: 32px; +} + h1 { font-size: 28px; } +h2 { + font-size: 24px; +} + +h3 { + font-size: 21px; +} + code { color: #333; } diff --git a/docs/ec2-scripts.md b/docs/ec2-scripts.md deleted file mode 100644 index 73578c8457..0000000000 --- a/docs/ec2-scripts.md +++ /dev/null @@ -1,146 +0,0 @@ ---- -layout: global -title: Using the Spark EC2 Scripts ---- -The `spark-ec2` script located in the Spark's `ec2` directory allows you -to launch, manage and shut down Spark clusters on Amazon EC2. It builds -on the [Mesos EC2 script](https://github.com/mesos/mesos/wiki/EC2-Scripts) -in Apache Mesos. - -`spark-ec2` is designed to manage multiple named clusters. You can -launch a new cluster (telling the script its size and giving it a name), -shutdown an existing cluster, or log into a cluster. Each cluster is -identified by placing its machines into EC2 security groups whose names -are derived from the name of the cluster. For example, a cluster named -`test` will contain a master node in a security group called -`test-master`, and a number of slave nodes in a security group called -`test-slaves`. The `spark-ec2` script will create these security groups -for you based on the cluster name you request. You can also use them to -identify machines belonging to each cluster in the EC2 Console or -ElasticFox. - -This guide describes how to get set up to run clusters, how to launch -clusters, how to run jobs on them, and how to shut them down. - -Before You Start -================ - -- Create an Amazon EC2 key pair for yourself. This can be done by - logging into your Amazon Web Services account through the [AWS - console](http://aws.amazon.com/console/), clicking Key Pairs on the - left sidebar, and creating and downloading a key. Make sure that you - set the permissions for the private key file to `600` (i.e. only you - can read and write it) so that `ssh` will work. -- Whenever you want to use the `spark-ec2` script, set the environment - variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to your - Amazon EC2 access key ID and secret access key. These can be - obtained from the [AWS homepage](http://aws.amazon.com/) by clicking - Account \> Security Credentials \> Access Credentials. - -Launching a Cluster -=================== - -- Go into the `ec2` directory in the release of Spark you downloaded. -- Run - `./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>`, - where `<keypair>` is the name of your EC2 key pair (that you gave it - when you created it), `<key-file>` is the private key file for your - key pair, `<num-slaves>` is the number of slave nodes to launch (try - 1 at first), and `<cluster-name>` is the name to give to your - cluster. -- After everything launches, check that Mesos is up and sees all the - slaves by going to the Mesos Web UI link printed at the end of the - script (`http://<master-hostname>:8080`). - -You can also run `./spark-ec2 --help` to see more usage options. The -following options are worth pointing out: - -- `--instance-type=<INSTANCE_TYPE>` can be used to specify an EC2 -instance type to use. For now, the script only supports 64-bit instance -types, and the default type is `m1.large` (which has 2 cores and 7.5 GB -RAM). Refer to the Amazon pages about [EC2 instance -types](http://aws.amazon.com/ec2/instance-types) and [EC2 -pricing](http://aws.amazon.com/ec2/#pricing) for information about other -instance types. -- `--zone=<EC2_ZONE>` can be used to specify an EC2 availability zone -to launch instances in. Sometimes, you will get an error because there -is not enough capacity in one zone, and you should try to launch in -another. This happens mostly with the `m1.large` instance types; -extra-large (both `m1.xlarge` and `c1.xlarge`) instances tend to be more -available. -- `--ebs-vol-size=GB` will attach an EBS volume with a given amount - of space to each node so that you can have a persistent HDFS cluster - on your nodes across cluster restarts (see below). -- If one of your launches fails due to e.g. not having the right -permissions on your private key file, you can run `launch` with the -`--resume` option to restart the setup process on an existing cluster. - -Running Jobs -============ - -- Go into the `ec2` directory in the release of Spark you downloaded. -- Run `./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>` to - SSH into the cluster, where `<keypair>` and `<key-file>` are as - above. (This is just for convenience; you could also use - the EC2 console.) -- To deploy code or data within your cluster, you can log in and use the - provided script `~/mesos-ec2/copy-dir`, which, - given a directory path, RSYNCs it to the same location on all the slaves. -- If your job needs to access large datasets, the fastest way to do - that is to load them from Amazon S3 or an Amazon EBS device into an - instance of the Hadoop Distributed File System (HDFS) on your nodes. - The `spark-ec2` script already sets up a HDFS instance for you. It's - installed in `/root/ephemeral-hdfs`, and can be accessed using the - `bin/hadoop` script in that directory. Note that the data in this - HDFS goes away when you stop and restart a machine. -- There is also a *persistent HDFS* instance in - `/root/presistent-hdfs` that will keep data across cluster restarts. - Typically each node has relatively little space of persistent data - (about 3 GB), but you can use the `--ebs-vol-size` option to - `spark-ec2` to attach a persistent EBS volume to each node for - storing the persistent HDFS. -- Finally, if you get errors while running your jobs, look at the slave's logs - for that job using the Mesos web UI (`http://<master-hostname>:8080`). - -Terminating a Cluster -===================== - -***Note that there is no way to recover data on EC2 nodes after shutting -them down! Make sure you have copied everything important off the nodes -before stopping them.*** - -- Go into the `ec2` directory in the release of Spark you downloaded. -- Run `./spark-ec2 destroy <cluster-name>`. - -Pausing and Restarting Clusters -=============================== - -The `spark-ec2` script also supports pausing a cluster. In this case, -the VMs are stopped but not terminated, so they -***lose all data on ephemeral disks*** but keep the data in their -root partitions and their `persistent-hdfs`. Stopped machines will not -cost you any EC2 cycles, but ***will*** continue to cost money for EBS -storage. - -- To stop one of your clusters, go into the `ec2` directory and run -`./spark-ec2 stop <cluster-name>`. -- To restart it later, run -`./spark-ec2 -i <key-file> start <cluster-name>`. -- To ultimately destroy the cluster and stop consuming EBS space, run -`./spark-ec2 destroy <cluster-name>` as described in the previous -section. - -Limitations -=========== - -- `spark-ec2` currently only launches machines in the US-East region of EC2. - It should not be hard to make it launch VMs in other zones, but you will need - to create your own AMIs in them. -- Support for "cluster compute" nodes is limited -- there's no way to specify a - locality group. However, you can launch slave nodes in your - `<clusterName>-slaves` group manually and then use `spark-ec2 launch - --resume` to start a cluster with them. -- Support for spot instances is limited. - -If you have a patch or suggestion for one of these limitations, feel free to -[contribute]({{HOME_PATH}}contributing-to-spark.html) it! diff --git a/docs/index.md b/docs/index.md index 48ab151e41..a3ad2d11ce 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,8 +1,7 @@ --- layout: global -title: Spark - Fast Cluster Computing +title: Spark Overview --- -# Spark Overview Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It is written in [Scala](http://www.scala-lang.org), a high-level language for the JVM, and exposes a clean language-integrated syntax that makes it easy to write parallel jobs. Spark runs on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager. diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index 9807228121..947de13855 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -2,7 +2,6 @@ layout: global title: Running Spark on Mesos --- -# Running Spark on Mesos To run on a cluster, Spark uses the [Apache Mesos](http://incubator.apache.org/mesos/) resource manager. Follow the steps below to install Mesos and Spark: -- GitLab