python/pyspark/sql/dataframe.py · 9952217749118ae78fe794ca11e1c4a87a4ae8ba · cs525-sp18-g07 / spark

9 years ago

[SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame. · 99522177

Reynold Xin authored 9 years ago

Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).

This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion.

Author: Reynold Xin <rxin@databricks.com>

Closes #8876 from rxin/SPARK-10731.

99522177

History

[SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame.

Reynold Xin authored 9 years ago

Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).

Author: Reynold Xin <rxin@databricks.com>

Closes #8876 from rxin/SPARK-10731.

dataframe.py 49.82 KiB