-
- Downloads
[SPARK-695] In DAGScheduler's getPreferredLocs, track set of visited partitions.
getPreferredLocs traverses a dependency graph of partitions using depth first search. Given a complex dependency graph, the old implementation may explore a set of paths in the graph that is exponential in the number of nodes. By maintaining a set of visited nodes the new implementation avoids revisiting nodes, preventing exponential blowup. Some comment and whitespace cleanups are also included. Author: Aaron Staple <aaron.staple@gmail.com> Closes #1362 from staple/SPARK-695 and squashes the following commits: ecea0f3 [Aaron Staple] address review comments 751c661 [Aaron Staple] [SPARK-695] Add a unit test. 5adf326 [Aaron Staple] Replace getPreferredLocsInternal's HashMap argument with a simpler HashSet. 58e37d0 [Aaron Staple] Replace comment documenting NarrowDependency. 6751ced [Aaron Staple] Revert "Remove unused variable." 04c7097 [Aaron Staple] Fix indentation. 0030884 [Aaron Staple] Remove unused variable. 33f67c6 [Aaron Staple] Clarify comment. 4e42b46 [Aaron Staple] Remove apparently incorrect comment describing NarrowDependency. 65c2d3d [Aaron Staple] [SPARK-695] In DAGScheduler's getPreferredLocs, track set of visited partitions.
Showing
- core/src/main/scala/org/apache/spark/Dependency.scala 2 additions, 2 deletionscore/src/main/scala/org/apache/spark/Dependency.scala
- core/src/main/scala/org/apache/spark/SparkContext.scala 1 addition, 1 deletioncore/src/main/scala/org/apache/spark/SparkContext.scala
- core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala 2 additions, 2 deletionscore/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala
- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 17 additions, 1 deletion.../main/scala/org/apache/spark/scheduler/DAGScheduler.scala
- core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 15 additions, 1 deletion.../scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
Loading
Please register or sign in to comment