Skip to content
Snippets Groups Projects
Commit afb13163 authored by Davies Liu's avatar Davies Liu Committed by Reynold Xin
Browse files

[SPARK-5678] Convert DataFrame to pandas.DataFrame and Series

```
pyspark.sql.DataFrame.to_pandas = to_pandas(self) unbound pyspark.sql.DataFrame method
    Collect all the rows and return a `pandas.DataFrame`.

    >>> df.to_pandas()  # doctest: +SKIP
       age   name
    0    2  Alice
    1    5    Bob

pyspark.sql.Column.to_pandas = to_pandas(self) unbound pyspark.sql.Column method
    Return a pandas.Series from the column

    >>> df.age.to_pandas()  # doctest: +SKIP
    0    2
    1    5
    dtype: int64
```

Not tests by jenkins (they depends on pandas)

Author: Davies Liu <davies@databricks.com>

Closes #4476 from davies/to_pandas and squashes the following commits:

6276fb6 [Davies Liu] Convert DataFrame to pandas.DataFrame and Series
parent de780604
No related branches found
No related tags found
No related merge requests found
......@@ -2284,6 +2284,18 @@ class DataFrame(object):
"""
return self.select('*', col.alias(colName))
def to_pandas(self):
"""
Collect all the rows and return a `pandas.DataFrame`.
>>> df.to_pandas() # doctest: +SKIP
age name
0 2 Alice
1 5 Bob
"""
import pandas as pd
return pd.DataFrame.from_records(self.collect(), columns=self.columns)
# Having SchemaRDD for backward compatibility (for docs)
class SchemaRDD(DataFrame):
......@@ -2551,6 +2563,19 @@ class Column(DataFrame):
jc = self._jc.cast(jdt)
return Column(jc, self.sql_ctx)
def to_pandas(self):
"""
Return a pandas.Series from the column
>>> df.age.to_pandas() # doctest: +SKIP
0 2
1 5
dtype: int64
"""
import pandas as pd
data = [c for c, in self.collect()]
return pd.Series(data)
def _aggregate_func(name, doc=""):
""" Create a function for aggregator by name"""
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment