Skip to content
Snippets Groups Projects
  • Andrew Or's avatar
    79820fe8
    [SPARK-1276] Add a HistoryServer to render persisted UI · 79820fe8
    Andrew Or authored
    The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
    Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.
    
    This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.
    
    To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.
    
    Comments and feedback are most welcome.
    
    ---
    
    A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.
    
    A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #204 from andrewor14/master and squashes the following commits:
    
    7b7234c [Andrew Or] Finished -> Completed
    b158d98 [Andrew Or] Address Patrick's comments
    69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
    19d5dd0 [Andrew Or] Merge github.com:apache/spark
    f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
    2dfb494 [Andrew Or] Decouple checking for application completion from replaying
    d02dbaa [Andrew Or] Expose Spark version and include it in event logs
    2282300 [Andrew Or] Add documentation for the HistoryServer
    567474a [Andrew Or] Merge github.com:apache/spark
    6edf052 [Andrew Or] Merge github.com:apache/spark
    19e1fb4 [Andrew Or] Address Thomas' comments
    248cb3d [Andrew Or] Limit number of live applications + add configurability
    a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
    bc46fc8 [Andrew Or] Merge github.com:apache/spark
    e2f4ff9 [Andrew Or] Merge github.com:apache/spark
    050419e [Andrew Or] Merge github.com:apache/spark
    81b568b [Andrew Or] Fix strange error messages...
    0670743 [Andrew Or] Decouple page rendering from loading files from disk
    1b2f391 [Andrew Or] Minor changes
    a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
    d5154da [Andrew Or] Styling and comments
    5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
    60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
    7584418 [Andrew Or] Report application start/end times to HistoryServer
    8aac163 [Andrew Or] Add basic application table
    c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface
    79820fe8
    History
    [SPARK-1276] Add a HistoryServer to render persisted UI
    Andrew Or authored
    The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
    Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.
    
    This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.
    
    To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.
    
    Comments and feedback are most welcome.
    
    ---
    
    A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.
    
    A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    Closes #204 from andrewor14/master and squashes the following commits:
    
    7b7234c [Andrew Or] Finished -> Completed
    b158d98 [Andrew Or] Address Patrick's comments
    69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
    19d5dd0 [Andrew Or] Merge github.com:apache/spark
    f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
    2dfb494 [Andrew Or] Decouple checking for application completion from replaying
    d02dbaa [Andrew Or] Expose Spark version and include it in event logs
    2282300 [Andrew Or] Add documentation for the HistoryServer
    567474a [Andrew Or] Merge github.com:apache/spark
    6edf052 [Andrew Or] Merge github.com:apache/spark
    19e1fb4 [Andrew Or] Address Thomas' comments
    248cb3d [Andrew Or] Limit number of live applications + add configurability
    a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
    bc46fc8 [Andrew Or] Merge github.com:apache/spark
    e2f4ff9 [Andrew Or] Merge github.com:apache/spark
    050419e [Andrew Or] Merge github.com:apache/spark
    81b568b [Andrew Or] Fix strange error messages...
    0670743 [Andrew Or] Decouple page rendering from loading files from disk
    1b2f391 [Andrew Or] Minor changes
    a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
    d5154da [Andrew Or] Styling and comments
    5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
    60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
    7584418 [Andrew Or] Report application start/end times to HistoryServer
    8aac163 [Andrew Or] Add basic application table
    c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface