diff --git a/apache-spark/README.md b/apache-spark/README.md index 733c1201ce5ffcdb898b523e3c0629d95cd77a4d..0676314d0299552298edaa00350b174dfc9923de 100644 --- a/apache-spark/README.md +++ b/apache-spark/README.md @@ -1,28 +1,28 @@ -## Prerequisites: +# Prerequisites: -# Java 7+ installed on master & slave machines -# Scala 2+ installed on master & slave machines -# PySpark installed on master machine (`sudo pip install pyspark`) -# apache-spark project & streaming files cloned on each slave machine +### Java 7+ installed on master & slave machines +### Scala 2+ installed on master & slave machines +### PySpark installed on master machine (`sudo pip install pyspark`) +### apache-spark project & streaming files cloned on each slave machine -## Start up Spark cluster +# Start up Spark cluster ``` ./sbin/start-all.sh ``` -# This starts up VMs 2-10 as worker machines, cluster summary can be seen on http://MASTER-IP:8080/ +### This starts up VMs 2-10 as worker machines, cluster summary can be seen on http://MASTER-IP:8080/ -## Submit a word count job to the cluster +# Submit a word count job to the cluster ``` bin/spark-submit --master spark://MASTER-IP:7077 --deploy-mode client ~/apache-spark/python/wordcount.py ~/apache-spark/python/higgs-activity_time.txt ``` -# Job summary can be seen on http://MASTER-IP:4040/ +### Job summary can be seen on http://MASTER-IP:4040/ -# Notes: +## Notes: -# 1. spark-env.sh file within conf/ should be modified as per job & master/slave requirements -# 2. /etc/hosts file on master & slave should have IP addresses resolved to the same hostnames within the slaves file in conf/ -# 3. Project includes setup scripts/examples/driver code from Apache Spark +### 1. `spark-env.sh` file within `conf/` should be modified as per job & master/slave requirements +### 2. `/etc/hosts` file on master & slave should have IP addresses resolved to the same hostnames within the slaves file in `conf/` +### 3. Project includes setup scripts/examples/driver code from Apache Spark (https://github.com/apache/spark)