-
- Downloads
[SPARK-13792][SQL] Limit logging of bad records in CSV data source
## What changes were proposed in this pull request? This pull request adds a new option (maxMalformedLogPerPartition) in CSV reader to limit the maximum of logging message Spark generates per partition for malformed records. The error log looks something like ``` 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: Dropping malformed line: adsf,1,4 16/06/20 18:50:14 WARN CSVRelation: More than 10 malformed records have been found on this partition. Malformed records from now on will not be logged. ``` Closes #12173 ## How was this patch tested? Manually tested. Author: Reynold Xin <rxin@databricks.com> Closes #13795 from rxin/SPARK-13792.
Showing
- python/pyspark/sql/readwriter.py 4 additions, 0 deletionspython/pyspark/sql/readwriter.py
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 2 additions, 0 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala 8 additions, 1 deletion...e/spark/sql/execution/datasources/csv/CSVFileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala 2 additions, 0 deletions...ache/spark/sql/execution/datasources/csv/CSVOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala 28 additions, 14 deletions...che/spark/sql/execution/datasources/csv/CSVRelation.scala
Loading
Please register or sign in to comment