-
- Downloads
[SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame.
Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take). This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion. Author: Reynold Xin <rxin@databricks.com> Closes #8876 from rxin/SPARK-10731.
Showing
- core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 1 addition, 1 deletion...rc/main/scala/org/apache/spark/api/python/PythonRDD.scala
- python/pyspark/sql/dataframe.py 4 additions, 1 deletionpython/pyspark/sql/dataframe.py
- sql/core/src/main/scala/org/apache/spark/sql/execution/python.scala 13 additions, 1 deletion...rc/main/scala/org/apache/spark/sql/execution/python.scala
- sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala 7 additions, 9 deletions...ain/scala/org/apache/spark/sql/test/ExamplePointUDT.scala
Loading
Please register or sign in to comment