-
vundela authored
Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings. If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties Author: vundela <vsr@cloudera.com> Author: Srinivasa Reddy Vundela <vsr@cloudera.com> Closes #9118 from vundela/master.
vundela authoredCurrently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings. If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties Author: vundela <vsr@cloudera.com> Author: Srinivasa Reddy Vundela <vsr@cloudera.com> Closes #9118 from vundela/master.
layout: global
title: Running Spark on YARN
Support for running on YARN (Hadoop NextGen) was added to Spark in version 0.6.0, and improved in subsequent releases.
Launching Spark on YARN
Ensure that HADOOP_CONF_DIR
or YARN_CONF_DIR
points to the directory which contains the (client side) configuration files for the Hadoop cluster.
These configs are used to write to HDFS and connect to the YARN ResourceManager. The
configuration contained in this directory will be distributed to the YARN cluster so that all
containers used by the application use the same configuration. If the configuration references
Java system properties or environment variables not managed by YARN, they should also be set in the
Spark application's configuration (driver, executors, and the AM when running in client mode).
There are two deploy modes that can be used to launch Spark applications on YARN. In cluster
mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client
mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Unlike Spark standalone and Mesos modes, in which the master's address is specified in the --master
parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the --master
parameter is yarn
.
To launch a Spark application in cluster
mode:
$ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]
For example:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs.
To launch a Spark application in client
mode, do the same, but replace cluster
with client
. The following shows how you can run spark-shell
in client
mode:
$ ./bin/spark-shell --master yarn --deploy-mode client
Adding Other JARs
In cluster
mode, the driver runs on a different machine than the client, so SparkContext.addJar
won't work out of the box with files that are local to the client. To make files on the client available to SparkContext.addJar
, include them with the --jars
option in the launch command.
$ ./bin/spark-submit --class my.main.Class \
--master yarn \
--deploy-mode cluster \
--jars my-other-jar.jar,my-other-other-jar.jar
my-main-jar.jar
app_arg1 app_arg2
Preparations
Running Spark on YARN requires a binary distribution of Spark which is built with YARN support. Binary distributions can be downloaded from the downloads page of the project website. To build Spark yourself, refer to Building Spark.
Configuration
Most of the configs are the same for Spark on YARN as for other deployment modes. See the configuration page for more information on those. These are configs that are specific to Spark on YARN.