Skip to content
Snippets Groups Projects
Commit c1b6fa98 authored by Davies Liu's avatar Davies Liu Committed by Reynold Xin
Browse files

[SPARK-5878] fix DataFrame.repartition() in Python

Also add tests for distinct()

Author: Davies Liu <davies@databricks.com>

Closes #4667 from davies/repartition and squashes the following commits:

79059fd [Davies Liu] add test
cb4915e [Davies Liu] fix repartition
parent de0dd6de
No related branches found
No related tags found
No related merge requests found
......@@ -434,12 +434,18 @@ class DataFrame(object):
def repartition(self, numPartitions):
""" Return a new :class:`DataFrame` that has exactly `numPartitions`
partitions.
>>> df.repartition(10).rdd.getNumPartitions()
10
"""
return DataFrame(self._jdf.repartition(numPartitions, None), self.sql_ctx)
return DataFrame(self._jdf.repartition(numPartitions), self.sql_ctx)
def distinct(self):
"""
Return a new :class:`DataFrame` containing the distinct rows in this DataFrame.
>>> df.distinct().count()
2L
"""
return DataFrame(self._jdf.distinct(), self.sql_ctx)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment