Skip to content
Snippets Groups Projects
  • Mortada Mehyar's avatar
    675a7371
    [DOCUMENTATION] fixed groupby aggregation example for pyspark · 675a7371
    Mortada Mehyar authored
    ## What changes were proposed in this pull request?
    
    fixing documentation for the groupby/agg example in python
    
    ## How was this patch tested?
    
    the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()`
    
    after the fix here's how I tested it:
    
    ```
    In [1]: from pyspark.sql import Row
    
    In [2]: import pyspark.sql.functions as func
    
    In [3]: %cpaste
    Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
    :records = [{'age': 19, 'department': 1, 'expense': 100},
    : {'age': 20, 'department': 1, 'expense': 200},
    : {'age': 21, 'department': 2, 'expense': 300},
    : {'age': 22, 'department': 2, 'expense': 300},
    : {'age': 23, 'department': 3, 'expense': 300}]
    :--
    
    In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records])
    
    In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show()
    
    +----------+----------+--------+------------+
    |department|department|max(age)|sum(expense)|
    +----------+----------+--------+------------+
    |         1|         1|      20|         300|
    |         2|         2|      22|         600|
    |         3|         3|      23|         300|
    +----------+----------+--------+------------+
    
    Author: Mortada Mehyar <mortada.mehyar@gmail.com>
    
    Closes #13587 from mortada/groupby_agg_doc_fix.
    675a7371
    History
    [DOCUMENTATION] fixed groupby aggregation example for pyspark
    Mortada Mehyar authored
    ## What changes were proposed in this pull request?
    
    fixing documentation for the groupby/agg example in python
    
    ## How was this patch tested?
    
    the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()`
    
    after the fix here's how I tested it:
    
    ```
    In [1]: from pyspark.sql import Row
    
    In [2]: import pyspark.sql.functions as func
    
    In [3]: %cpaste
    Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
    :records = [{'age': 19, 'department': 1, 'expense': 100},
    : {'age': 20, 'department': 1, 'expense': 200},
    : {'age': 21, 'department': 2, 'expense': 300},
    : {'age': 22, 'department': 2, 'expense': 300},
    : {'age': 23, 'department': 3, 'expense': 300}]
    :--
    
    In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records])
    
    In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show()
    
    +----------+----------+--------+------------+
    |department|department|max(age)|sum(expense)|
    +----------+----------+--------+------------+
    |         1|         1|      20|         300|
    |         2|         2|      22|         600|
    |         3|         3|      23|         300|
    +----------+----------+--------+------------+
    
    Author: Mortada Mehyar <mortada.mehyar@gmail.com>
    
    Closes #13587 from mortada/groupby_agg_doc_fix.