-
- Downloads
[SPARK-2321] Stable pull-based progress / status API
This pull request is a first step towards the implementation of a stable, pull-based progress / status API for Spark (see [SPARK-2321](https://issues.apache.org/jira/browse/SPARK-2321)). For now, I'd like to discuss the basic implementation, API names, and overall interface design. Once we arrive at a good design, I'll go back and add additional methods to expose more information via these API. #### Design goals: - Pull-based API - Usable from Java / Scala / Python (eventually, likely with a wrapper) - Can be extended to expose more information without introducing binary incompatibilities. - Returns immutable objects. - Don't leak any implementation details, preserving our freedom to change the implementation. #### Implementation: - Add public methods (`getJobInfo`, `getStageInfo`) to SparkContext to allow status / progress information to be retrieved. - Add public interfaces (`SparkJobInfo`, `SparkStageInfo`) for our API return values. These interfaces consist entirely of Java-style getter methods. The interfaces are currently implemented in Java. I decided to explicitly separate the interface from its implementation (`SparkJobInfoImpl`, `SparkStageInfoImpl`) in order to prevent users from constructing these responses themselves. -Allow an existing JobProgressListener to be used when constructing a live SparkUI. This allows us to re-use this listeners in the implementation of this status API. There are a few reasons why this listener re-use makes sense: - The status API and web UI are guaranteed to show consistent information. - These listeners are already well-tested. - The same garbage-collection / information retention configurations can apply to both this API and the web UI. - Extend JobProgressListener to maintain `jobId -> Job` and `stageId -> Stage` mappings. The progress API methods are implemented in a separate trait that's mixed into SparkContext. This helps to avoid SparkContext.scala from becoming larger and more difficult to read. Author: Josh Rosen <joshrosen@databricks.com> Author: Josh Rosen <joshrosen@apache.org> Closes #2696 from JoshRosen/progress-reporting-api and squashes the following commits: e6aa78d [Josh Rosen] Add tests. b585c16 [Josh Rosen] Accept SparkListenerBus instead of more specific subclasses. c96402d [Josh Rosen] Address review comments. 2707f98 [Josh Rosen] Expose current stage attempt id c28ba76 [Josh Rosen] Update demo code: 646ff1d [Josh Rosen] Document spark.ui.retainedJobs. 7f47d6d [Josh Rosen] Clean up SparkUI constructors, per Andrew's feedback. b77b3d8 [Josh Rosen] Merge remote-tracking branch 'origin/master' into progress-reporting-api 787444c [Josh Rosen] Move status API methods into trait that can be mixed into SparkContext. f9a9a00 [Josh Rosen] More review comments: 3dc79af [Josh Rosen] Remove creation of unused listeners in SparkContext. 249ca16 [Josh Rosen] Address several review comments: da5648e [Josh Rosen] Add example of basic progress reporting in Java. 7319ffd [Josh Rosen] Add getJobIdsForGroup() and num*Tasks() methods. cc568e5 [Josh Rosen] Add note explaining that interfaces should not be implemented outside of Spark. 6e840d4 [Josh Rosen] Remove getter-style names and "consistent snapshot" semantics: 08cbec9 [Josh Rosen] Begin to sketch the interfaces for a stable, public status API. ac2d13a [Josh Rosen] Add jobId->stage, stageId->stage mappings in JobProgressListener 24de263 [Josh Rosen] Create UI listeners in SparkContext instead of in Tabs:
Showing
- core/src/main/java/org/apache/spark/JobExecutionStatus.java 25 additions, 0 deletionscore/src/main/java/org/apache/spark/JobExecutionStatus.java
- core/src/main/java/org/apache/spark/SparkJobInfo.java 30 additions, 0 deletionscore/src/main/java/org/apache/spark/SparkJobInfo.java
- core/src/main/java/org/apache/spark/SparkStageInfo.java 34 additions, 0 deletionscore/src/main/java/org/apache/spark/SparkStageInfo.java
- core/src/main/scala/org/apache/spark/SparkContext.scala 9 additions, 67 deletionscore/src/main/scala/org/apache/spark/SparkContext.scala
- core/src/main/scala/org/apache/spark/SparkStatusAPI.scala 142 additions, 0 deletionscore/src/main/scala/org/apache/spark/SparkStatusAPI.scala
- core/src/main/scala/org/apache/spark/StatusAPIImpl.scala 34 additions, 0 deletionscore/src/main/scala/org/apache/spark/StatusAPIImpl.scala
- core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala 19 additions, 0 deletions...in/scala/org/apache/spark/api/java/JavaSparkContext.scala
- core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala 1 addition, 1 deletion...a/org/apache/spark/deploy/history/FsHistoryProvider.scala
- core/src/main/scala/org/apache/spark/deploy/master/Master.scala 2 additions, 2 deletions...rc/main/scala/org/apache/spark/deploy/master/Master.scala
- core/src/main/scala/org/apache/spark/ui/SparkUI.scala 71 additions, 37 deletionscore/src/main/scala/org/apache/spark/ui/SparkUI.scala
- core/src/main/scala/org/apache/spark/ui/env/EnvironmentTab.scala 1 addition, 3 deletions...c/main/scala/org/apache/spark/ui/env/EnvironmentTab.scala
- core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 1 addition, 2 deletions...rc/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala
- core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala 43 additions, 6 deletions.../scala/org/apache/spark/ui/jobs/JobProgressListener.scala
- core/src/main/scala/org/apache/spark/ui/jobs/JobProgressPage.scala 4 additions, 5 deletions...main/scala/org/apache/spark/ui/jobs/JobProgressPage.scala
- core/src/main/scala/org/apache/spark/ui/jobs/JobProgressTab.scala 4 additions, 6 deletions.../main/scala/org/apache/spark/ui/jobs/JobProgressTab.scala
- core/src/main/scala/org/apache/spark/ui/jobs/PoolPage.scala 1 addition, 2 deletionscore/src/main/scala/org/apache/spark/ui/jobs/PoolPage.scala
- core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala 8 additions, 0 deletionscore/src/main/scala/org/apache/spark/ui/jobs/UIData.scala
- core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala 1 addition, 2 deletions...c/main/scala/org/apache/spark/ui/storage/StorageTab.scala
- core/src/test/scala/org/apache/spark/StatusAPISuite.scala 78 additions, 0 deletionscore/src/test/scala/org/apache/spark/StatusAPISuite.scala
- docs/configuration.md 10 additions, 1 deletiondocs/configuration.md
Loading
Please register or sign in to comment