-
- Downloads
[SPARK-10008] Ensure shuffle locality doesn't take precedence over narrow deps
The shuffle locality patch made the DAGScheduler aware of shuffle data, but for RDDs that have both narrow and shuffle dependencies, it can cause them to place tasks based on the shuffle dependency instead of the narrow one. This case is common in iterative join-based algorithms like PageRank and ALS, where one RDD is hash-partitioned and one isn't. Author: Matei Zaharia <matei@databricks.com> Closes #8220 from mateiz/shuffle-loc-fix.
Showing
- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 20 additions, 17 deletions.../main/scala/org/apache/spark/scheduler/DAGScheduler.scala
- core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 24 additions, 2 deletions.../scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Loading
Please register or sign in to comment