Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    4b4ee260
    [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when... · 4b4ee260
    hyukjinkwon authored
    [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to disallow default value None when 'to_replace' is not a dictionary.
    
    It seems weird we set the default value of `value` to `None` and we ended up allowing the case as below:
    
    ```python
    >>> df.show()
    ```
    ```
    +----+------+-----+
    | age|height| name|
    +----+------+-----+
    |  10|    80|Alice|
    ...
    ```
    
    ```python
    >>> df.na.replace('Alice').show()
    ```
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    **After**
    
    This PR targets to disallow the case above:
    
    ```python
    >>> df.na.replace('Alice').show()
    ```
    ```
    ...
    TypeError: value is required when to_replace is not a dictionary.
    ```
    
    while we still allow when `to_replace` is a dictionary:
    
    ```python
    >>> df.na.replace({'Alice': None}).show()
    ```
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    ## How was this patch tested?
    
    Manually tested, tests were added in `python/pyspark/sql/tests.py` and doctests were fixed.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #20499 from HyukjinKwon/SPARK-19454-followup.
    4b4ee260
    History
    [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when...
    hyukjinkwon authored
    [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to disallow default value None when 'to_replace' is not a dictionary.
    
    It seems weird we set the default value of `value` to `None` and we ended up allowing the case as below:
    
    ```python
    >>> df.show()
    ```
    ```
    +----+------+-----+
    | age|height| name|
    +----+------+-----+
    |  10|    80|Alice|
    ...
    ```
    
    ```python
    >>> df.na.replace('Alice').show()
    ```
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    **After**
    
    This PR targets to disallow the case above:
    
    ```python
    >>> df.na.replace('Alice').show()
    ```
    ```
    ...
    TypeError: value is required when to_replace is not a dictionary.
    ```
    
    while we still allow when `to_replace` is a dictionary:
    
    ```python
    >>> df.na.replace({'Alice': None}).show()
    ```
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    ## How was this patch tested?
    
    Manually tested, tests were added in `python/pyspark/sql/tests.py` and doctests were fixed.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #20499 from HyukjinKwon/SPARK-19454-followup.