Skip to content
  • Reynold Xin's avatar
    c775bf09
    [SPARK-13792][SQL] Limit logging of bad records in CSV data source · c775bf09
    Reynold Xin authored
    ## What changes were proposed in this pull request?
    This pull request adds a new option (maxMalformedLogPerPartition) in CSV reader to limit the maximum of logging message Spark generates per partition for malformed records.
    
    The error log looks something like
    ```
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: More than 10 malformed records have been found on this partition. Malformed records from now on will not be logged.
    ```
    
    Closes #12173
    
    ## How was this patch tested?
    Manually tested.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #13795 from rxin/SPARK-13792.
    c775bf09
    [SPARK-13792][SQL] Limit logging of bad records in CSV data source
    Reynold Xin authored
    ## What changes were proposed in this pull request?
    This pull request adds a new option (maxMalformedLogPerPartition) in CSV reader to limit the maximum of logging message Spark generates per partition for malformed records.
    
    The error log looks something like
    ```
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4
    16/06/20 18:50:14 WARN CSVRelation: More than 10 malformed records have been found on this partition. Malformed records from now on will not be logged.
    ```
    
    Closes #12173
    
    ## How was this patch tested?
    Manually tested.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #13795 from rxin/SPARK-13792.
Loading