Skip to content
  • Andrew Or's avatar
    fc8b5819
    [SPARK-6943] [SPARK-6944] DAG visualization on SparkUI · fc8b5819
    Andrew Or authored
    This patch adds the functionality to display the RDD DAG on the SparkUI.
    
    This DAG describes the relationships between
    - an RDD and its dependencies,
    - an RDD and its operation scopes, and
    - an RDD's operation scopes and the stage / job hierarchy
    
    An operation scope here refers to the existing public APIs that created the RDDs (e.g. `textFile`, `treeAggregate`). In the future, we can expand this to include higher level operations like SQL queries.
    
    *Note: This blatantly stole a few lines of HTML and JavaScript from #5547 (thanks shroffpradyumn!)*
    
    Here's what the job page looks like:
    <img src="https://issues.apache.org/jira/secure/attachment/12730286/job-page.png" width="700px"/>
    and the stage page:
    <img src="https://issues.apache.org/jira/secure/attachment/12730287/stage-page.png" width="300px"/>
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #5729 from andrewor14/viz2 and squashes the following commits:
    
    666c03b [Andrew Or] Round corners of RDD boxes on stage page (minor)
    01ba336 [Andrew Or] Change RDD cache color to red (minor)
    6f9574a [Andrew Or] Add tests for RDDOperationScope
    1c310e4 [Andrew Or] Wrap a few more RDD functions in an operation scope
    3ffe566 [Andrew Or] Restore "null" as default for RDD name
    5fdd89d [Andrew Or] children -> child (minor)
    0d07a84 [Andrew Or] Fix python style
    afb98e2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    0d7aa32 [Andrew Or] Fix python tests
    3459ab2 [Andrew Or] Fix tests
    832443c [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    429e9e1 [Andrew Or] Display cached RDDs on the viz
    b1f0fd1 [Andrew Or] Rename OperatorScope -> RDDOperationScope
    31aae06 [Andrew Or] Extract visualization logic from listener
    83f9c58 [Andrew Or] Implement a programmatic representation of operator scopes
    5a7faf4 [Andrew Or] Rename references to viz scopes to viz clusters
    ee33d52 [Andrew Or] Separate HTML generating code from listener
    f9830a2 [Andrew Or] Refactor + clean up + document JS visualization code
    b80cc52 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    0706992 [Andrew Or] Add link from jobs to stages
    deb48a0 [Andrew Or] Translate stage boxes taking into account the width
    5c7ce16 [Andrew Or] Connect RDDs across stages + update style
    ab91416 [Andrew Or] Introduce visualization to the Job Page
    5f07e9c [Andrew Or] Remove more return statements from scopes
    5e388ea [Andrew Or] Fix line too long
    43de96e [Andrew Or] Add parent IDs to StageInfo
    6e2cfea [Andrew Or] Remove all return statements in `withScope`
    d19c4da [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    7ef957c [Andrew Or] Fix scala style
    4310271 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    aa868a9 [Andrew Or] Ensure that HadoopRDD is actually serializable
    c3bfcae [Andrew Or] Re-implement scopes using closures instead of annotations
    52187fc [Andrew Or] Rat excludes
    09d361e [Andrew Or] Add ID to node label (minor)
    71281fa [Andrew Or] Embed the viz in the UI in a toggleable manner
    8dd5af2 [Andrew Or] Fill in documentation + miscellaneous minor changes
    fe7816f [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
    205f838 [Andrew Or] Reimplement rendering with dagre-d3 instead of viz.js
    5e22946 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
    6a7cdca [Andrew Or] Move RDD scope util methods and logic to its own file
    494d5c2 [Andrew Or] Revert a few unintended style changes
    9fac6f3 [Andrew Or] Re-implement scopes through annotations instead
    f22f337 [Andrew Or] First working implementation of visualization with vis.js
    2184348 [Andrew Or] Translate RDD information to dot file
    5143523 [Andrew Or] Expose the necessary information in RDDInfo
    a9ed4f9 [Andrew Or] Add a few missing scopes to certain RDD methods
    6b3403b [Andrew Or] Scope all RDD methods
    fc8b5819
    [SPARK-6943] [SPARK-6944] DAG visualization on SparkUI
    Andrew Or authored
    This patch adds the functionality to display the RDD DAG on the SparkUI.
    
    This DAG describes the relationships between
    - an RDD and its dependencies,
    - an RDD and its operation scopes, and
    - an RDD's operation scopes and the stage / job hierarchy
    
    An operation scope here refers to the existing public APIs that created the RDDs (e.g. `textFile`, `treeAggregate`). In the future, we can expand this to include higher level operations like SQL queries.
    
    *Note: This blatantly stole a few lines of HTML and JavaScript from #5547 (thanks shroffpradyumn!)*
    
    Here's what the job page looks like:
    <img src="https://issues.apache.org/jira/secure/attachment/12730286/job-page.png" width="700px"/>
    and the stage page:
    <img src="https://issues.apache.org/jira/secure/attachment/12730287/stage-page.png" width="300px"/>
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #5729 from andrewor14/viz2 and squashes the following commits:
    
    666c03b [Andrew Or] Round corners of RDD boxes on stage page (minor)
    01ba336 [Andrew Or] Change RDD cache color to red (minor)
    6f9574a [Andrew Or] Add tests for RDDOperationScope
    1c310e4 [Andrew Or] Wrap a few more RDD functions in an operation scope
    3ffe566 [Andrew Or] Restore "null" as default for RDD name
    5fdd89d [Andrew Or] children -> child (minor)
    0d07a84 [Andrew Or] Fix python style
    afb98e2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    0d7aa32 [Andrew Or] Fix python tests
    3459ab2 [Andrew Or] Fix tests
    832443c [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    429e9e1 [Andrew Or] Display cached RDDs on the viz
    b1f0fd1 [Andrew Or] Rename OperatorScope -> RDDOperationScope
    31aae06 [Andrew Or] Extract visualization logic from listener
    83f9c58 [Andrew Or] Implement a programmatic representation of operator scopes
    5a7faf4 [Andrew Or] Rename references to viz scopes to viz clusters
    ee33d52 [Andrew Or] Separate HTML generating code from listener
    f9830a2 [Andrew Or] Refactor + clean up + document JS visualization code
    b80cc52 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    0706992 [Andrew Or] Add link from jobs to stages
    deb48a0 [Andrew Or] Translate stage boxes taking into account the width
    5c7ce16 [Andrew Or] Connect RDDs across stages + update style
    ab91416 [Andrew Or] Introduce visualization to the Job Page
    5f07e9c [Andrew Or] Remove more return statements from scopes
    5e388ea [Andrew Or] Fix line too long
    43de96e [Andrew Or] Add parent IDs to StageInfo
    6e2cfea [Andrew Or] Remove all return statements in `withScope`
    d19c4da [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    7ef957c [Andrew Or] Fix scala style
    4310271 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
    aa868a9 [Andrew Or] Ensure that HadoopRDD is actually serializable
    c3bfcae [Andrew Or] Re-implement scopes using closures instead of annotations
    52187fc [Andrew Or] Rat excludes
    09d361e [Andrew Or] Add ID to node label (minor)
    71281fa [Andrew Or] Embed the viz in the UI in a toggleable manner
    8dd5af2 [Andrew Or] Fill in documentation + miscellaneous minor changes
    fe7816f [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
    205f838 [Andrew Or] Reimplement rendering with dagre-d3 instead of viz.js
    5e22946 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
    6a7cdca [Andrew Or] Move RDD scope util methods and logic to its own file
    494d5c2 [Andrew Or] Revert a few unintended style changes
    9fac6f3 [Andrew Or] Re-implement scopes through annotations instead
    f22f337 [Andrew Or] First working implementation of visualization with vis.js
    2184348 [Andrew Or] Translate RDD information to dot file
    5143523 [Andrew Or] Expose the necessary information in RDDInfo
    a9ed4f9 [Andrew Or] Add a few missing scopes to certain RDD methods
    6b3403b [Andrew Or] Scope all RDD methods
Loading