From c6f4e704214097f17d2d6abfbfef4bb208e4339f Mon Sep 17 00:00:00 2001 From: Sandy Ryza <sandy@cloudera.com> Date: Mon, 10 Nov 2014 12:40:41 -0800 Subject: [PATCH] SPARK-4230. Doc for spark.default.parallelism is incorrect Author: Sandy Ryza <sandy@cloudera.com> Closes #3107 from sryza/sandy-spark-4230 and squashes the following commits: 37a1d19 [Sandy Ryza] Clear up a couple things 34d53de [Sandy Ryza] SPARK-4230. Doc for spark.default.parallelism is incorrect --- docs/configuration.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index 0f9eb81f6e..f0b396e21f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -562,6 +562,9 @@ Apart from these, the following properties are also available, and may be useful <tr> <td><code>spark.default.parallelism</code></td> <td> + For distributed shuffle operations like <code>reduceByKey</code> and <code>join</code>, the + largest number of partitions in a parent RDD. For operations like <code>parallelize</code> + with no parent RDDs, it depends on the cluster manager: <ul> <li>Local mode: number of cores on the local machine</li> <li>Mesos fine grained mode: 8</li> @@ -569,8 +572,8 @@ Apart from these, the following properties are also available, and may be useful </ul> </td> <td> - Default number of tasks to use across the cluster for distributed shuffle operations - (<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user. + Default number of partitions in RDDs returned by transformations like <code>join</code>, + <code>reduceByKey</code>, and <code>parallelize</code> when not set by user. </td> </tr> <tr> -- GitLab