-
- Downloads
[SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None
## What changes were proposed in this pull request? Currently `df.na.replace("*", Map[String, String]("NULL" -> null))` will produce exception. This PR enables passing null/None as value in the replacement map in DataFrame.replace(). Note that the replacement map keys and values should still be the same type, while the values can have a mix of null/None and that type. This PR enables following operations for example: `df.na.replace("*", Map[String, String]("NULL" -> null))`(scala) `df.na.replace("*", Map[Any, Any](60 -> null, 70 -> 80))`(scala) `df.na.replace('Alice', None)`(python) `df.na.replace([10, 20])`(python, replacing with None is by default) One use case could be: I want to replace all the empty strings with null/None because they were incorrectly generated and then drop all null/None data `df.na.replace("*", Map("" -> null)).na.drop()`(scala) `df.replace(u'', None).dropna()`(python) ## How was this patch tested? Scala unit test. Python doctest and unit test. Author: bravo-zhang <mzhang1230@gmail.com> Closes #18820 from bravo-zhang/spark-14932.
Showing
- python/pyspark/sql/dataframe.py 23 additions, 12 deletionspython/pyspark/sql/dataframe.py
- python/pyspark/sql/tests.py 15 additions, 0 deletionspython/pyspark/sql/tests.py
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala 32 additions, 25 deletions...ain/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala 43 additions, 0 deletions...cala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala
Loading
Please register or sign in to comment