-
- Downloads
[Spark-1538] Fix SparkUI incorrectly hiding persisted RDDs
**Bug**: After the following command `sc.parallelize(1 to 1000).persist.map(_ + 1).count()` is run, the the persisted RDD is missing from the storage tab of the SparkUI. **Cause**: The command creates two RDDs in one stage, a `ParallelCollectionRDD` and a `MappedRDD`. However, the existing StageInfo only keeps the RDDInfo of the last RDD associated with the stage (`MappedRDD`), and so all RDD information regarding the first RDD (`ParallelCollectionRDD`) is discarded. In this case, we persist the first RDD, but the StorageTab doesn't know about this RDD because it is not encoded in the StageInfo. **Fix**: Record information of all RDDs in StageInfo, instead of just the last RDD (i.e. `stage.rdd`). Since stage boundaries are marked by shuffle dependencies, the solution is to traverse the last RDD's dependency tree, visiting only ancestor RDDs related through a sequence of narrow dependencies. --- This PR also moves RDDInfo to its own file, includes a few style fixes, and adds a unit test for constructing StageInfos. Author: Andrew Or <andrewor14@gmail.com> Closes #469 from andrewor14/storage-ui-fix and squashes the following commits: 07fc7f0 [Andrew Or] Add back comment that was accidentally removed (minor) 5d799fe [Andrew Or] Add comment to justify testing of getNarrowAncestors with cycles 9d0e2b8 [Andrew Or] Hide details of getNarrowAncestors from outsiders d2bac8a [Andrew Or] Deal with cycles in RDD dependency graph + add extensive tests 2acb177 [Andrew Or] Move getNarrowAncestors to RDD.scala bfe83f0 [Andrew Or] Backtrace RDD dependency tree to find all RDDs that belong to a Stage
Showing
- core/src/main/scala/org/apache/spark/TaskContext.scala 1 addition, 1 deletioncore/src/main/scala/org/apache/spark/TaskContext.scala
- core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 1 addition, 1 deletion...rc/main/scala/org/apache/spark/executor/TaskMetrics.scala
- core/src/main/scala/org/apache/spark/rdd/RDD.scala 25 additions, 0 deletionscore/src/main/scala/org/apache/spark/rdd/RDD.scala
- core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala 1 addition, 1 deletion...src/main/scala/org/apache/spark/scheduler/JobLogger.scala
- core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala 12 additions, 7 deletions...src/main/scala/org/apache/spark/scheduler/StageInfo.scala
- core/src/main/scala/org/apache/spark/storage/RDDInfo.scala 55 additions, 0 deletionscore/src/main/scala/org/apache/spark/storage/RDDInfo.scala
- core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 7 additions, 37 deletions...rc/main/scala/org/apache/spark/storage/StorageUtils.scala
- core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 3 additions, 3 deletions...c/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala
- core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala 2 additions, 2 deletions...c/main/scala/org/apache/spark/ui/storage/StorageTab.scala
- core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 12 additions, 10 deletionscore/src/main/scala/org/apache/spark/util/JsonProtocol.scala
- core/src/test/scala/org/apache/spark/CacheManagerSuite.scala 3 additions, 3 deletionscore/src/test/scala/org/apache/spark/CacheManagerSuite.scala
- core/src/test/scala/org/apache/spark/PipedRDDSuite.scala 1 addition, 1 deletioncore/src/test/scala/org/apache/spark/PipedRDDSuite.scala
- core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala 143 additions, 5 deletionscore/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
- core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala 46 additions, 8 deletions...scala/org/apache/spark/scheduler/SparkListenerSuite.scala
- core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 6 additions, 2 deletions.../test/scala/org/apache/spark/util/JsonProtocolSuite.scala
Loading
Please register or sign in to comment