Skip to content
  • hyukjinkwon's avatar
    51841d77
    [SPARK-13866] [SQL] Handle decimal type in CSV inference at CSV data source. · 51841d77
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-13866
    
    This PR adds the support to infer `DecimalType`.
    Here are the rules between `IntegerType`, `LongType` and `DecimalType`.
    
    #### Infering Types
    
    1. `IntegerType` and then `LongType`are tried first.
    
      ```scala
      Int.MaxValue => IntegerType
      Long.MaxValue => LongType
      ```
    
    2. If it fails, try `DecimalType`.
    
      ```scala
      (Long.MaxValue + 1) => DecimalType(20, 0)
      ```
      This does not try to infer this as `DecimalType` when scale is less than 0.
    
    3. if it fails, try `DoubleType`
      ```scala
      0.1 => DoubleType // This is failed to be inferred as `DecimalType` because it has the scale, 1.
      ```
    
    #### Compatible Types (Merging Types)
    
    For merging types, this is the same with JSON data source. If `DecimalType` is not capable, then it becomes `DoubleType`
    
    ## How was this patch tested?
    
    Unit tests were used and `./dev/run_tests` for code style test.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    Author: Hyukjin Kwon <gurwls223@gmail.com>
    
    Closes #11724 from HyukjinKwon/SPARK-13866.
    51841d77
    [SPARK-13866] [SQL] Handle decimal type in CSV inference at CSV data source.
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-13866
    
    This PR adds the support to infer `DecimalType`.
    Here are the rules between `IntegerType`, `LongType` and `DecimalType`.
    
    #### Infering Types
    
    1. `IntegerType` and then `LongType`are tried first.
    
      ```scala
      Int.MaxValue => IntegerType
      Long.MaxValue => LongType
      ```
    
    2. If it fails, try `DecimalType`.
    
      ```scala
      (Long.MaxValue + 1) => DecimalType(20, 0)
      ```
      This does not try to infer this as `DecimalType` when scale is less than 0.
    
    3. if it fails, try `DoubleType`
      ```scala
      0.1 => DoubleType // This is failed to be inferred as `DecimalType` because it has the scale, 1.
      ```
    
    #### Compatible Types (Merging Types)
    
    For merging types, this is the same with JSON data source. If `DecimalType` is not capable, then it becomes `DoubleType`
    
    ## How was this patch tested?
    
    Unit tests were used and `./dev/run_tests` for code style test.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    Author: Hyukjin Kwon <gurwls223@gmail.com>
    
    Closes #11724 from HyukjinKwon/SPARK-13866.
Loading