-`--spark-version=VERSION` will pre-load the cluster with the
specified version of Spark. VERSION can be a version number
(e.g. "0.7.2") or a specific git hash. By default, a recent
version will be used.
- If one of your launches fails due to e.g. not having the right
- If one of your launches fails due to e.g. not having the right
permissions on your private key file, you can run `launch` with the
permissions on your private key file, you can run `launch` with the
`--resume` option to restart the setup process on an existing cluster.
`--resume` option to restart the setup process on an existing cluster.
...
@@ -99,9 +104,8 @@ permissions on your private key file, you can run `launch` with the
...
@@ -99,9 +104,8 @@ permissions on your private key file, you can run `launch` with the
`spark-ec2` to attach a persistent EBS volume to each node for
`spark-ec2` to attach a persistent EBS volume to each node for
storing the persistent HDFS.
storing the persistent HDFS.
- Finally, if you get errors while running your jobs, look at the slave's logs
- Finally, if you get errors while running your jobs, look at the slave's logs
for that job inside of the Mesos work directory (/mnt/mesos-work). You can
for that job inside of the scheduler work directory (/root/spark/work). You can
also view the status of the cluster using the Mesos web UI
also view the status of the cluster using the web UI: `http://<master-hostname>:8080`.
(`http://<master-hostname>:8080`).
# Configuration
# Configuration
...
@@ -141,22 +145,14 @@ section.
...
@@ -141,22 +145,14 @@ section.
# Limitations
# Limitations
-`spark-ec2` currently only launches machines in the US-East region of EC2.
It should not be hard to make it launch VMs in other zones, but you will need
to create your own AMIs in them.
- Support for "cluster compute" nodes is limited -- there's no way to specify a
- Support for "cluster compute" nodes is limited -- there's no way to specify a
locality group. However, you can launch slave nodes in your
locality group. However, you can launch slave nodes in your
`<clusterName>-slaves` group manually and then use `spark-ec2 launch
`<clusterName>-slaves` group manually and then use `spark-ec2 launch
--resume` to start a cluster with them.
--resume` to start a cluster with them.
- Support for spot instances is limited.
If you have a patch or suggestion for one of these limitations, feel free to
If you have a patch or suggestion for one of these limitations, feel free to
[contribute](contributing-to-spark.html) it!
[contribute](contributing-to-spark.html) it!
# Using a Newer Spark Version
The Spark EC2 machine images may not come with the latest version of Spark. To use a newer version, you can run `git pull` to pull in `/root/spark` to pull in the latest version of Spark from `git`, and build it using `sbt/sbt compile`. You will also need to copy it to all the other nodes in the cluster using `~/spark-ec2/copy-dir /root/spark`.
# Accessing Data in S3
# Accessing Data in S3
Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form `s3n://<bucket>/path`. You will also need to set your Amazon security credentials, either by setting the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` before your program or through `SparkContext.hadoopConfiguration`. Full instructions on S3 access using the Hadoop input libraries can be found on the [Hadoop S3 page](http://wiki.apache.org/hadoop/AmazonS3).
Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form `s3n://<bucket>/path`. You will also need to set your Amazon security credentials, either by setting the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` before your program or through `SparkContext.hadoopConfiguration`. Full instructions on S3 access using the Hadoop input libraries can be found on the [Hadoop S3 page](http://wiki.apache.org/hadoop/AmazonS3).