Skip to content
  • hyukjinkwon's avatar
    0cdcf911
    [SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array · 0cdcf911
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to support an array of struct type in `to_json` as below:
    
    ```scala
    import org.apache.spark.sql.functions._
    
    val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
    df.select(to_json($"a").as("json")).show()
    ```
    
    ```
    +----------+
    |      json|
    +----------+
    |[{"_1":1}]|
    +----------+
    ```
    
    Currently, it throws an exception as below (a newline manually inserted for readability):
    
    ```
    org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type
    mismatch: structtojson requires that the expression is a struct expression.;;
    ```
    
    This allows the roundtrip with `from_json` as below:
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    
    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array"))
    df.show()
    
    // Read back.
    df.select(to_json($"array").as("json")).show()
    ```
    
    ```
    +----------+
    |     array|
    +----------+
    |[[1], [2]]|
    +----------+
    
    +-----------------+
    |             json|
    +-----------------+
    |[{"a":1},{"a":2}]|
    +-----------------+
    ```
    
    Also, this PR proposes to rename from `StructToJson` to `StructsToJson ` and `JsonToStruct` to `JsonToStructs`.
    
    ## How was this patch tested?
    
    Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #17192 from HyukjinKwon/SPARK-19849.
    0cdcf911
    [SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to support an array of struct type in `to_json` as below:
    
    ```scala
    import org.apache.spark.sql.functions._
    
    val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
    df.select(to_json($"a").as("json")).show()
    ```
    
    ```
    +----------+
    |      json|
    +----------+
    |[{"_1":1}]|
    +----------+
    ```
    
    Currently, it throws an exception as below (a newline manually inserted for readability):
    
    ```
    org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type
    mismatch: structtojson requires that the expression is a struct expression.;;
    ```
    
    This allows the roundtrip with `from_json` as below:
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    
    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array"))
    df.show()
    
    // Read back.
    df.select(to_json($"array").as("json")).show()
    ```
    
    ```
    +----------+
    |     array|
    +----------+
    |[[1], [2]]|
    +----------+
    
    +-----------------+
    |             json|
    +-----------------+
    |[{"a":1},{"a":2}]|
    +-----------------+
    ```
    
    Also, this PR proposes to rename from `StructToJson` to `StructsToJson ` and `JsonToStruct` to `JsonToStructs`.
    
    ## How was this patch tested?
    
    Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #17192 from HyukjinKwon/SPARK-19849.
Loading