Skip to content
Snippets Groups Projects
  • Reynold Xin's avatar
    99522177
    [SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame. · 99522177
    Reynold Xin authored
    Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).
    
    This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #8876 from rxin/SPARK-10731.
    99522177
    History
    [SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame.
    Reynold Xin authored
    Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).
    
    This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #8876 from rxin/SPARK-10731.
dataframe.py 49.82 KiB