Skip to content
Snippets Groups Projects
  • Dongjoon Hyun's avatar
    142df483
    [SPARK-16429][SQL] Include `StringType` columns in `describe()` · 142df483
    Dongjoon Hyun authored
    ## What changes were proposed in this pull request?
    
    Currently, Spark `describe` supports `StringType`. However, `describe()` returns a dataset for only all numeric columns. This PR aims to include `StringType` columns in `describe()`, `describe` without argument.
    
    **Background**
    ```scala
    scala> spark.read.json("examples/src/main/resources/people.json").describe("age", "name").show()
    +-------+------------------+-------+
    |summary|               age|   name|
    +-------+------------------+-------+
    |  count|                 2|      3|
    |   mean|              24.5|   null|
    | stddev|7.7781745930520225|   null|
    |    min|                19|   Andy|
    |    max|                30|Michael|
    +-------+------------------+-------+
    ```
    
    **Before**
    ```scala
    scala> spark.read.json("examples/src/main/resources/people.json").describe().show()
    +-------+------------------+
    |summary|               age|
    +-------+------------------+
    |  count|                 2|
    |   mean|              24.5|
    | stddev|7.7781745930520225|
    |    min|                19|
    |    max|                30|
    +-------+------------------+
    ```
    
    **After**
    ```scala
    scala> spark.read.json("examples/src/main/resources/people.json").describe().show()
    +-------+------------------+-------+
    |summary|               age|   name|
    +-------+------------------+-------+
    |  count|                 2|      3|
    |   mean|              24.5|   null|
    | stddev|7.7781745930520225|   null|
    |    min|                19|   Andy|
    |    max|                30|Michael|
    +-------+------------------+-------+
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins with a update testcase.
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #14095 from dongjoon-hyun/SPARK-16429.
    142df483
    History
    [SPARK-16429][SQL] Include `StringType` columns in `describe()`
    Dongjoon Hyun authored
    ## What changes were proposed in this pull request?
    
    Currently, Spark `describe` supports `StringType`. However, `describe()` returns a dataset for only all numeric columns. This PR aims to include `StringType` columns in `describe()`, `describe` without argument.
    
    **Background**
    ```scala
    scala> spark.read.json("examples/src/main/resources/people.json").describe("age", "name").show()
    +-------+------------------+-------+
    |summary|               age|   name|
    +-------+------------------+-------+
    |  count|                 2|      3|
    |   mean|              24.5|   null|
    | stddev|7.7781745930520225|   null|
    |    min|                19|   Andy|
    |    max|                30|Michael|
    +-------+------------------+-------+
    ```
    
    **Before**
    ```scala
    scala> spark.read.json("examples/src/main/resources/people.json").describe().show()
    +-------+------------------+
    |summary|               age|
    +-------+------------------+
    |  count|                 2|
    |   mean|              24.5|
    | stddev|7.7781745930520225|
    |    min|                19|
    |    max|                30|
    +-------+------------------+
    ```
    
    **After**
    ```scala
    scala> spark.read.json("examples/src/main/resources/people.json").describe().show()
    +-------+------------------+-------+
    |summary|               age|   name|
    +-------+------------------+-------+
    |  count|                 2|      3|
    |   mean|              24.5|   null|
    | stddev|7.7781745930520225|   null|
    |    min|                19|   Andy|
    |    max|                30|Michael|
    +-------+------------------+-------+
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins with a update testcase.
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #14095 from dongjoon-hyun/SPARK-16429.