-
- Downloads
[SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json
## What changes were proposed in this pull request? This PR supports schema in a DDL formatted string for `from_json` in R/Python and `dapply` and `gapply` in R, which are commonly used and/or consistent with Scala APIs. Additionally, this PR exposes `structType` in R to allow working around in other possible corner cases. **Python** `from_json` ```python from pyspark.sql.functions import from_json data = [(1, '''{"a": 1}''')] df = spark.createDataFrame(data, ("key", "value")) df.select(from_json(df.value, "a INT").alias("json")).show() ``` **R** `from_json` ```R df <- sql("SELECT named_struct('name', 'Bob') as people") df <- mutate(df, people_json = to_json(df$people)) head(select(df, from_json(df$people_json, "name STRING"))) ``` `structType.character` ```R structType("a STRING, b INT") ``` `dapply` ```R dapply(createDataFrame(list(list(1.0)), "a"), function(x) {x}, "a DOUBLE") ``` `gapply` ```R gapply(createDataFrame(list(list(1.0)), "a"), "a", function(key, x) { x }, "a DOUBLE") ``` ## How was this patch tested? Doc tests for `from_json` in Python and unit tests `test_sparkSQL.R` in R. Author: hyukjinkwon <gurwls223@gmail.com> Closes #18498 from HyukjinKwon/SPARK-21266.
Showing
- R/pkg/NAMESPACE 2 additions, 0 deletionsR/pkg/NAMESPACE
- R/pkg/R/DataFrame.R 32 additions, 4 deletionsR/pkg/R/DataFrame.R
- R/pkg/R/functions.R 9 additions, 3 deletionsR/pkg/R/functions.R
- R/pkg/R/group.R 3 additions, 0 deletionsR/pkg/R/group.R
- R/pkg/R/schema.R 26 additions, 3 deletionsR/pkg/R/schema.R
- R/pkg/tests/fulltests/test_sparkSQL.R 76 additions, 60 deletionsR/pkg/tests/fulltests/test_sparkSQL.R
- python/pyspark/sql/functions.py 9 additions, 2 deletionspython/pyspark/sql/functions.py
- sql/core/src/main/scala/org/apache/spark/sql/functions.scala 3 additions, 4 deletionssql/core/src/main/scala/org/apache/spark/sql/functions.scala
Loading
Please register or sign in to comment