-
- Downloads
SPARK-1767: Prefer HDFS-cached replicas when scheduling data-local tasks
This change reorders the replicas returned by HadoopRDD#getPreferredLocations so that replicas cached by HDFS are at the start of the list. This requires Hadoop 2.5 or higher; previous versions of Hadoop do not expose the information needed to determine whether a replica is cached. Author: Colin Patrick Mccabe <cmccabe@cloudera.com> Closes #1486 from cmccabe/SPARK-1767 and squashes the following commits: 338d4f8 [Colin Patrick Mccabe] SPARK-1767: Prefer HDFS-cached replicas when scheduling data-local tasks
Showing
- core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala 56 additions, 4 deletionscore/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
- core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala 15 additions, 3 deletionscore/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
- core/src/main/scala/org/apache/spark/rdd/RDD.scala 1 addition, 1 deletioncore/src/main/scala/org/apache/spark/rdd/RDD.scala
- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 1 addition, 1 deletion.../main/scala/org/apache/spark/scheduler/DAGScheduler.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala 43 additions, 5 deletions.../main/scala/org/apache/spark/scheduler/TaskLocation.scala
- core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala 22 additions, 3 deletions...ain/scala/org/apache/spark/scheduler/TaskSetManager.scala
- core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala 22 additions, 0 deletions...cala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
- project/MimaExcludes.scala 2 additions, 0 deletionsproject/MimaExcludes.scala
Loading
Please register or sign in to comment