Skip to content
Snippets Groups Projects
Commit 7aacb7bf authored by Li Zhihui's avatar Li Zhihui Committed by Andrew Or
Browse files

[SPARK-2713] Executors of same application in same host should only download files & jars once

If Spark lunched multiple executors in one host for one application, every executor would download it dependent files and jars (if not using local: url) independently. It maybe result in huge latency. In my case, it result in 20 seconds latency to download dependent jars(size about 17M) when I lunched 32 executors in every host(total 4 hosts).

This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. In my case, the latency was reduced from 20 seconds to less than 1 second.

Author: Li Zhihui <zhihui.li@intel.com>
Author: li-zhihui <zhihui.li@intel.com>

Closes #1616 from li-zhihui/cachefiles and squashes the following commits:

36940df [Li Zhihui] Close cache for local mode
935fed6 [Li Zhihui] Clean code.
f9330d4 [Li Zhihui] Clean code again
7050d46 [Li Zhihui] Clean code
074a422 [Li Zhihui] Fix: deal with spark.files.overwrite
03ed3a8 [li-zhihui] rename cache file name as XXXXXXXXX_cache
2766055 [li-zhihui] Use url.hashCode + timestamp as cachedFileName
76a7b66 [Li Zhihui] Clean code & use applcation work directory as cache directory
3510eb0 [Li Zhihui] Keep fetchFile private
2ffd742 [Li Zhihui] add comment for FileLock
e0ebd48 [Li Zhihui] Try and finally lock.release
7fb7c0b [Li Zhihui] Release lock before copy files
6b997bf [Li Zhihui] Executors of same application in same host should only download files & jars once
parent 6a40a768
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment