Skip to content
Snippets Groups Projects
Commit e7f4d253 authored by Tathagata Das's avatar Tathagata Das Committed by Reynold Xin
Browse files

[SPARK-4612] Reduce task latency and increase scheduling throughput by making...

[SPARK-4612] Reduce task latency and increase scheduling throughput by making configuration initialization lazy

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L337 creates a configuration object for every task that is launched, even if there is no new dependent file/JAR to update. This is a heavy-weight creation that should be avoided if there is no new file/JAR to update. This PR makes that creation lazy. Quick local test in spark-perf scheduling throughput tests gives the following numbers in a local standalone scheduler mode.
1 job with 10000 tasks: before 7.8395 seconds, after 2.6415 seconds = 3x increase in task scheduling throughput

pwendell JoshRosen

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #3463 from tdas/lazy-config and squashes the following commits:

c791c1e [Tathagata Das] Reduce task latency by making configuration initialization lazy
parent 346bc17a
No related branches found
No related tags found
No related merge requests found
......@@ -334,7 +334,7 @@ private[spark] class Executor(
* SparkContext. Also adds any new JARs we fetched to the class loader.
*/
private def updateDependencies(newFiles: HashMap[String, Long], newJars: HashMap[String, Long]) {
val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
lazy val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
synchronized {
// Fetch missing dependencies
for ((name, timestamp) <- newFiles if currentFiles.getOrElse(name, -1L) < timestamp) {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment