diff --git a/README.md b/README.md index c3695ac800af465d03450146955ccac140b08af6..68ffda2ecbeae7b41b20aee4f91c5fccd9650985 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ which is packaged with it. To build Spark and its example programs, run: To run Spark, you will need to have Scala's bin in your `PATH`, or you will need to set the `SCALA_HOME` environment variable to point to where you've installed Scala. Scala must be accessible through one of these -methods on Mesos slave nodes as well as on the master. +methods on your cluster's worker nodes as well as its master. To run one of the examples, use `./run <class> <params>`. For example: @@ -31,11 +31,11 @@ will run the Logistic Regression example locally on 2 CPUs. Each of the example programs prints usage help if no params are given. All of the Spark samples take a `<host>` parameter that is the Mesos master -to connect to. This can be a Mesos URL, or "local" to run locally with one -thread, or "local[N]" to run locally with N threads. +to connect to. This can be a Mesos or Spark cluster URL, or "local" to run +locally with one thread, or "local[N]" to run locally with N threads. -## A Note About Hadoop +## A Note About Hadoop Versions Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS API has changed in different versions of @@ -46,27 +46,23 @@ of `project/SparkBuild.scala`, then rebuilding Spark. ## Configuration -Spark can be configured through two files: `conf/java-opts` and -`conf/spark-env.sh`. +Please refer to the "Configuration" guide in the online documentation for a +full overview on how to configure Spark. At the minimum, you will need to +create a `conf/spark-env.sh` script (copy `conf/spark-env.sh.template`) and +set the following two variables: -In `java-opts`, you can add flags to be passed to the JVM when running Spark. +- `SCALA_HOME`: Location where Scala is installed. -In `spark-env.sh`, you can set any environment variables you wish to be available -when running Spark programs, such as `PATH`, `SCALA_HOME`, etc. There are also -several Spark-specific variables you can set: +- `MESOS_NATIVE_LIBRARY`: Your Mesos library (only needed if you want to run + on Mesos). For example, this might be `/usr/local/lib/libmesos.so` on Linux. -- `SPARK_CLASSPATH`: Extra entries to be added to the classpath, separated by ":". -- `SPARK_MEM`: Memory for Spark to use, in the format used by java's `-Xmx` - option (for example, `-Xmx200m` means 200 MB, `-Xmx1g` means 1 GB, etc). +## Contributing to Spark -- `SPARK_LIBRARY_PATH`: Extra entries to add to `java.library.path` for locating - shared libraries. - -- `SPARK_JAVA_OPTS`: Extra options to pass to JVM. - -- `MESOS_NATIVE_LIBRARY`: Your Mesos library, if you want to run on a Mesos - cluster. For example, this might be `/usr/local/lib/libmesos.so` on Linux. - -Note that `spark-env.sh` must be a shell script (it must be executable and start -with a `#!` header to specify the shell to use). +Contributions via GitHub pull requests are gladly accepted from their original +author. Along with any pull requests, please state that the contribution is +your original work and that you license the work to the project under the +project's open source license. Whether or not you state this explicitly, by +submitting any copyrighted material via pull request, email, or other means +you agree to license the material under the project's open source license and +warrant that you have the legal authority to do so.