-
- Downloads
[SPARK-13866] [SQL] Handle decimal type in CSV inference at CSV data source.
## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13866 This PR adds the support to infer `DecimalType`. Here are the rules between `IntegerType`, `LongType` and `DecimalType`. #### Infering Types 1. `IntegerType` and then `LongType`are tried first. ```scala Int.MaxValue => IntegerType Long.MaxValue => LongType ``` 2. If it fails, try `DecimalType`. ```scala (Long.MaxValue + 1) => DecimalType(20, 0) ``` This does not try to infer this as `DecimalType` when scale is less than 0. 3. if it fails, try `DoubleType` ```scala 0.1 => DoubleType // This is failed to be inferred as `DecimalType` because it has the scale, 1. ``` #### Compatible Types (Merging Types) For merging types, this is the same with JSON data source. If `DecimalType` is not capable, then it becomes `DoubleType` ## How was this patch tested? Unit tests were used and `./dev/run_tests` for code style test. Author: hyukjinkwon <gurwls223@gmail.com> Author: Hyukjin Kwon <gurwls223@gmail.com> Closes #11724 from HyukjinKwon/SPARK-13866.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala 48 additions, 2 deletions.../spark/sql/execution/datasources/csv/CSVInferSchema.scala
- sql/core/src/test/resources/decimal.csv 7 additions, 0 deletionssql/core/src/test/resources/decimal.csv
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala 11 additions, 2 deletions...k/sql/execution/datasources/csv/CSVInferSchemaSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala 15 additions, 0 deletions...apache/spark/sql/execution/datasources/csv/CSVSuite.scala
Loading
Please register or sign in to comment