Skip to content
Snippets Groups Projects
  • Evan Chan's avatar
    1440154c
    SPARK-1154: Clean up app folders in worker nodes · 1440154c
    Evan Chan authored
    This is a fix for [SPARK-1154](https://issues.apache.org/jira/browse/SPARK-1154).   The issue is that worker nodes fill up with a huge number of app-* folders after some time.  This change adds a periodic cleanup task which asynchronously deletes app directories older than a configurable TTL.
    
    Two new configuration parameters have been introduced:
      spark.worker.cleanup_interval
      spark.worker.app_data_ttl
    
    This change does not include moving the downloads of application jars to a location outside of the work directory.  We will address that if we have time, but that potentially involves caching so it will come either as part of this PR or a separate PR.
    
    Author: Evan Chan <ev@ooyala.com>
    Author: Kelvin Chu <kelvinkwchu@yahoo.com>
    
    Closes #288 from velvia/SPARK-1154-cleanup-app-folders and squashes the following commits:
    
    0689995 [Evan Chan] CR from @aarondav - move config, clarify for standalone mode
    9f10d96 [Evan Chan] CR from @pwendell - rename configs and add cleanup.enabled
    f2f6027 [Evan Chan] CR from @andrewor14
    553d8c2 [Kelvin Chu] change the variable name to currentTimeMillis since it actually tracks in seconds
    8dc9cb5 [Kelvin Chu] Fixed a bug in Utils.findOldFiles() after merge.
    cb52f2b [Kelvin Chu] Change the name of findOldestFiles() to findOldFiles()
    72f7d2d [Kelvin Chu] Fix a bug of Utils.findOldestFiles(). file.lastModified is returned in milliseconds.
    ad99955 [Kelvin Chu] Add unit test for Utils.findOldestFiles()
    dc1a311 [Evan Chan] Don't recompute current time with every new file
    e3c408e [Evan Chan] Document the two new settings
    b92752b [Evan Chan] SPARK-1154: Add a periodic task to clean up app directories
    1440154c
    History
    SPARK-1154: Clean up app folders in worker nodes
    Evan Chan authored
    This is a fix for [SPARK-1154](https://issues.apache.org/jira/browse/SPARK-1154).   The issue is that worker nodes fill up with a huge number of app-* folders after some time.  This change adds a periodic cleanup task which asynchronously deletes app directories older than a configurable TTL.
    
    Two new configuration parameters have been introduced:
      spark.worker.cleanup_interval
      spark.worker.app_data_ttl
    
    This change does not include moving the downloads of application jars to a location outside of the work directory.  We will address that if we have time, but that potentially involves caching so it will come either as part of this PR or a separate PR.
    
    Author: Evan Chan <ev@ooyala.com>
    Author: Kelvin Chu <kelvinkwchu@yahoo.com>
    
    Closes #288 from velvia/SPARK-1154-cleanup-app-folders and squashes the following commits:
    
    0689995 [Evan Chan] CR from @aarondav - move config, clarify for standalone mode
    9f10d96 [Evan Chan] CR from @pwendell - rename configs and add cleanup.enabled
    f2f6027 [Evan Chan] CR from @andrewor14
    553d8c2 [Kelvin Chu] change the variable name to currentTimeMillis since it actually tracks in seconds
    8dc9cb5 [Kelvin Chu] Fixed a bug in Utils.findOldFiles() after merge.
    cb52f2b [Kelvin Chu] Change the name of findOldestFiles() to findOldFiles()
    72f7d2d [Kelvin Chu] Fix a bug of Utils.findOldestFiles(). file.lastModified is returned in milliseconds.
    ad99955 [Kelvin Chu] Add unit test for Utils.findOldestFiles()
    dc1a311 [Evan Chan] Don't recompute current time with every new file
    e3c408e [Evan Chan] Document the two new settings
    b92752b [Evan Chan] SPARK-1154: Add a periodic task to clean up app directories