Skip to content
Snippets Groups Projects
  • Mark Grover's avatar
    66636ef0
    [SPARK-20435][CORE] More thorough redaction of sensitive information · 66636ef0
    Mark Grover authored
    This change does a more thorough redaction of sensitive information from logs and UI
    Add unit tests that ensure that no regressions happen that leak sensitive information to the logs.
    
    The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
    `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
    `
    Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well.
    
    ## How was this patch tested?
    
    New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it.
    
    Author: Mark Grover <mark@apache.org>
    
    Closes #17725 from markgrover/spark-20435.
    66636ef0
    History
    [SPARK-20435][CORE] More thorough redaction of sensitive information
    Mark Grover authored
    This change does a more thorough redaction of sensitive information from logs and UI
    Add unit tests that ensure that no regressions happen that leak sensitive information to the logs.
    
    The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
    `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
    `
    Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well.
    
    ## How was this patch tested?
    
    New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it.
    
    Author: Mark Grover <mark@apache.org>
    
    Closes #17725 from markgrover/spark-20435.