Snippets Groups Projects

11 years ago
f16c7817

Fix documentation to use yarn-standalone as master · f16c7817
Mridul Muralidharan authored 11 years ago

f16c7817

History

Fix documentation to use yarn-standalone as master
Mridul Muralidharan authored 11 years ago

running-on-yarn.md 4.27 KiB

layout: global
title: Launching Spark on YARN

Experimental support for running over a YARN (Hadoop NextGen) cluster was added to Spark in version 0.6.0. This was merged into master as part of 0.7 effort. To build spark core with YARN support, please use the hadoop2-yarn profile. Ex: mvn -Phadoop2-yarn clean install

Building spark core consolidated jar.

We need a consolidated spark core jar (which bundles all the required dependencies) to run Spark jobs on a yarn cluster. This can be built either through sbt or via maven.

Building spark assembled jar via sbt. It is a manual process of enabling it in project/SparkBuild.scala. Please comment out the HADOOP_VERSION, HADOOP_MAJOR_VERSION and HADOOP_YARN variables before the line 'For Hadoop 2 YARN support' Next, uncomment the subsequent 3 variable declaration lines (for these three variables) which enable hadoop yarn support.

Assembly of the jar Ex:
./sbt/sbt clean assembly

The assembled jar would typically be something like : ./streaming/target/spark-streaming-.jar

Building spark assembled jar via sbt. Use the hadoop2-yarn profile and execute the package target.

Something like this. Ex: $ mvn -Phadoop2-yarn clean package -DskipTests=true

This will build the shaded (consolidated) jar. Typically something like : ./repl-bin/target/spark-repl-bin--shaded-hadoop2-yarn.jar