Skip to content
Snippets Groups Projects
  • Dongjoon Hyun's avatar
    12a89e55
    [SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases · 12a89e55
    Dongjoon Hyun authored
    ## What changes were proposed in this pull request?
    
    `PySpark` loses `microsecond` precision for some corner cases during converting `Timestamp` into `Long`. For example, for the following `datetime.max` value should be converted a value whose last 6 digits are '999999'. This PR improves the logic not to lose precision for all cases.
    
    **Corner case**
    ```python
    >>> datetime.datetime.max
    datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
    ```
    
    **Before**
    ```python
    >>> from datetime import datetime
    >>> from pyspark.sql import Row
    >>> from pyspark.sql.types import StructType, StructField, TimestampType
    >>> schema = StructType([StructField("dt", TimestampType(), False)])
    >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
    [(253402329600000000,)]
    ```
    
    **After**
    ```python
    >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
    [(253402329599999999,)]
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with a new test case.
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #14631 from dongjoon-hyun/SPARK-17035.
    12a89e55
    History
    [SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases
    Dongjoon Hyun authored
    ## What changes were proposed in this pull request?
    
    `PySpark` loses `microsecond` precision for some corner cases during converting `Timestamp` into `Long`. For example, for the following `datetime.max` value should be converted a value whose last 6 digits are '999999'. This PR improves the logic not to lose precision for all cases.
    
    **Corner case**
    ```python
    >>> datetime.datetime.max
    datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
    ```
    
    **Before**
    ```python
    >>> from datetime import datetime
    >>> from pyspark.sql import Row
    >>> from pyspark.sql.types import StructType, StructField, TimestampType
    >>> schema = StructType([StructField("dt", TimestampType(), False)])
    >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
    [(253402329600000000,)]
    ```
    
    **After**
    ```python
    >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
    [(253402329599999999,)]
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with a new test case.
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #14631 from dongjoon-hyun/SPARK-17035.