Skip to content
  • hyukjinkwon's avatar
    369a148e
    [SPARK-19595][SQL] Support json array in from_json · 369a148e
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to both,
    
    **Do not allow json arrays with multiple elements and return null in `from_json` with `StructType` as the schema.**
    
    Currently, it only reads the single row when the input is a json array. So, the codes below:
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    val schema = StructType(StructField("a", IntegerType) :: Nil)
    Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show()
    ```
    prints
    
    ```
    +--------------------+
    |jsontostruct(struct)|
    +--------------------+
    |                 [1]|
    +--------------------+
    ```
    
    This PR simply suggests to print this as `null` if the schema is `StructType` and input is json array.with multiple elements
    
    ```
    +--------------------+
    |jsontostruct(struct)|
    +--------------------+
    |                null|
    +--------------------+
    ```
    
    **Support json arrays in `from_json` with `ArrayType` as the schema.**
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("array").select(from_json(col("array"), schema)).show()
    ```
    
    prints
    
    ```
    +-------------------+
    |jsontostruct(array)|
    +-------------------+
    |         [[1], [2]]|
    +-------------------+
    ```
    
    ## How was this patch tested?
    
    Unit test in `JsonExpressionsSuite`, `JsonFunctionsSuite`, Python doctests and manual test.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #16929 from HyukjinKwon/disallow-array.
    369a148e
    [SPARK-19595][SQL] Support json array in from_json
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to both,
    
    **Do not allow json arrays with multiple elements and return null in `from_json` with `StructType` as the schema.**
    
    Currently, it only reads the single row when the input is a json array. So, the codes below:
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    val schema = StructType(StructField("a", IntegerType) :: Nil)
    Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("struct").select(from_json(col("struct"), schema)).show()
    ```
    prints
    
    ```
    +--------------------+
    |jsontostruct(struct)|
    +--------------------+
    |                 [1]|
    +--------------------+
    ```
    
    This PR simply suggests to print this as `null` if the schema is `StructType` and input is json array.with multiple elements
    
    ```
    +--------------------+
    |jsontostruct(struct)|
    +--------------------+
    |                null|
    +--------------------+
    ```
    
    **Support json arrays in `from_json` with `ArrayType` as the schema.**
    
    ```scala
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.types._
    val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
    Seq(("""[{"a": 1}, {"a": 2}]""")).toDF("array").select(from_json(col("array"), schema)).show()
    ```
    
    prints
    
    ```
    +-------------------+
    |jsontostruct(array)|
    +-------------------+
    |         [[1], [2]]|
    +-------------------+
    ```
    
    ## How was this patch tested?
    
    Unit test in `JsonExpressionsSuite`, `JsonFunctionsSuite`, Python doctests and manual test.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #16929 from HyukjinKwon/disallow-array.
Loading