-
Nick Pritchard authored
Add documentation for configuration: - spark.sql.ui.retainedExecutions - spark.streaming.ui.retainedBatches Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #9052 from pnpritchard/SPARK-11039.
Nick Pritchard authoredAdd documentation for configuration: - spark.sql.ui.retainedExecutions - spark.streaming.ui.retainedBatches Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #9052 from pnpritchard/SPARK-11039.
layout: global
displayTitle: Spark Configuration
title: Configuration
- This will become a table of contents (this text will be scraped). {:toc}
Spark provides three locations to configure the system:
- Spark properties control most application parameters and can be set by using a SparkConf object, or through Java system properties.
-
Environment variables can be used to set per-machine settings, such as
the IP address, through the
conf/spark-env.sh
script on each node. -
Logging can be configured through
log4j.properties
.
Spark Properties
Spark properties control most application settings and are configured separately for each
application. These properties can be set directly on a
SparkConf passed to your
SparkContext
. SparkConf
allows you to configure some of the common properties
(e.g. master URL and application name), as well as arbitrary key-value pairs through the
set()
method. For example, we could initialize an application with two threads as follows:
Note that we run with local[2], meaning two threads - which represents "minimal" parallelism, which can help detect bugs that only exist when we run in a distributed context.
{% highlight scala %} val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep") val sc = new SparkContext(conf) {% endhighlight %}
Note that we can have more than 1 thread in local mode, and in cases like Spark Streaming, we may actually require one to prevent any sort of starvation issues.
Properties that specify some time duration should be configured with a unit of time. The following format is accepted: