-
- Downloads
[SPARK-20493][R] De-duplicate parse logics for DDL-like type strings in R
## What changes were proposed in this pull request? It seems we are using `SQLUtils.getSQLDataType` for type string in structField. It looks we can replace this with `CatalystSqlParser.parseDataType`. They look similar DDL-like type definitions as below: ```scala scala> Seq(Tuple1(Tuple1("a"))).toDF.show() ``` ``` +---+ | _1| +---+ |[a]| +---+ ``` ```scala scala> Seq(Tuple1(Tuple1("a"))).toDF.select($"_1".cast("struct<_1:string>")).show() ``` ``` +---+ | _1| +---+ |[a]| +---+ ``` Such type strings looks identical when R’s one as below: ```R > write.df(sql("SELECT named_struct('_1', 'a') as struct"), "/tmp/aa", "parquet") > collect(read.df("/tmp/aa", "parquet", structType(structField("struct", "struct<_1:string>")))) struct 1 a ``` R’s one is stricter because we are checking the types via regular expressions in R side ahead. Actual logics there look a bit different but as we check it ahead in R side, it looks replacing it would not introduce (I think) no behaviour changes. To make this sure, the tests dedicated for it were added in SPARK-20105. (It looks `structField` is the only place that calls this method). ## How was this patch tested? Existing tests - https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L143-L194 should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #17785 from HyukjinKwon/SPARK-20493.
Showing
- R/pkg/R/utils.R 8 additions, 0 deletionsR/pkg/R/utils.R
- R/pkg/inst/tests/testthat/test_sparkSQL.R 11 additions, 2 deletionsR/pkg/inst/tests/testthat/test_sparkSQL.R
- R/pkg/inst/tests/testthat/test_utils.R 3 additions, 3 deletionsR/pkg/inst/tests/testthat/test_utils.R
- sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 2 additions, 41 deletions.../src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala
Loading
Please register or sign in to comment