From 95d2823373de17e73fb4410d737886b3485085e7 Mon Sep 17 00:00:00 2001 From: Khanna <avkhann2@fa18-cs425-g05-01.cs.illinois.edu> Date: Sun, 2 Dec 2018 23:52:17 -0600 Subject: [PATCH] update readme --- apache-spark/README.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/apache-spark/README.md b/apache-spark/README.md index 733c120..0676314 100644 --- a/apache-spark/README.md +++ b/apache-spark/README.md @@ -1,28 +1,28 @@ -## Prerequisites: +# Prerequisites: -# Java 7+ installed on master & slave machines -# Scala 2+ installed on master & slave machines -# PySpark installed on master machine (`sudo pip install pyspark`) -# apache-spark project & streaming files cloned on each slave machine +### Java 7+ installed on master & slave machines +### Scala 2+ installed on master & slave machines +### PySpark installed on master machine (`sudo pip install pyspark`) +### apache-spark project & streaming files cloned on each slave machine -## Start up Spark cluster +# Start up Spark cluster ``` ./sbin/start-all.sh ``` -# This starts up VMs 2-10 as worker machines, cluster summary can be seen on http://MASTER-IP:8080/ +### This starts up VMs 2-10 as worker machines, cluster summary can be seen on http://MASTER-IP:8080/ -## Submit a word count job to the cluster +# Submit a word count job to the cluster ``` bin/spark-submit --master spark://MASTER-IP:7077 --deploy-mode client ~/apache-spark/python/wordcount.py ~/apache-spark/python/higgs-activity_time.txt ``` -# Job summary can be seen on http://MASTER-IP:4040/ +### Job summary can be seen on http://MASTER-IP:4040/ -# Notes: +## Notes: -# 1. spark-env.sh file within conf/ should be modified as per job & master/slave requirements -# 2. /etc/hosts file on master & slave should have IP addresses resolved to the same hostnames within the slaves file in conf/ -# 3. Project includes setup scripts/examples/driver code from Apache Spark +### 1. `spark-env.sh` file within `conf/` should be modified as per job & master/slave requirements +### 2. `/etc/hosts` file on master & slave should have IP addresses resolved to the same hostnames within the slaves file in `conf/` +### 3. Project includes setup scripts/examples/driver code from Apache Spark (https://github.com/apache/spark) -- GitLab