Skip to content
Snippets Groups Projects
Commit 4be53d03 authored by Sean Owen's avatar Sean Owen Committed by Andrew Or
Browse files

[SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps...

[SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps to preserve shuffle files

Clarify what may cause long-running Spark apps to preserve shuffle files

Author: Sean Owen <sowen@cloudera.com>

Closes #6901 from srowen/SPARK-5836 and squashes the following commits:

a9faef0 [Sean Owen] Clarify what may cause long-running Spark apps to preserve shuffle files
parent 68a2dca2
No related branches found
No related tags found
No related merge requests found
......@@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s
to disk, incurring the additional overhead of disk I/O and increased garbage collection.
Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files
are not cleaned up from Spark's temporary storage until Spark is stopped, which means that
long-running Spark jobs may consume available disk space. This is done so the shuffle doesn't need
to be re-computed if the lineage is re-computed. The temporary storage directory is specified by the
are preserved until the corresponding RDDs are no longer used and are garbage collected.
This is done so the shuffle files don't need to be re-created if the lineage is re-computed.
Garbage collection may happen only after a long period time, if the application retains references
to these RDDs or if GC does not kick in frequently. This means that long-running Spark jobs may
consume a large amount of disk space. The temporary storage directory is specified by the
`spark.local.dir` configuration parameter when configuring the Spark context.
Shuffle behavior can be tuned by adjusting a variety of configuration parameters. See the
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment