Skip to content
Snippets Groups Projects
  • Bryan Cutler's avatar
    209b9361
    [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas · 209b9361
    Bryan Cutler authored
    ## What changes were proposed in this pull request?
    
    This change uses Arrow to optimize the creation of a Spark DataFrame from a Pandas DataFrame. The input df is sliced according to the default parallelism. The optimization is enabled with the existing conf "spark.sql.execution.arrow.enabled" and is disabled by default.
    
    ## How was this patch tested?
    
    Added new unit test to create DataFrame with and without the optimization enabled, then compare results.
    
    Author: Bryan Cutler <cutlerb@gmail.com>
    Author: Takuya UESHIN <ueshin@databricks.com>
    
    Closes #19459 from BryanCutler/arrow-createDataFrame-from_pandas-SPARK-20791.
    209b9361
    History
    [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas
    Bryan Cutler authored
    ## What changes were proposed in this pull request?
    
    This change uses Arrow to optimize the creation of a Spark DataFrame from a Pandas DataFrame. The input df is sliced according to the default parallelism. The optimization is enabled with the existing conf "spark.sql.execution.arrow.enabled" and is disabled by default.
    
    ## How was this patch tested?
    
    Added new unit test to create DataFrame with and without the optimization enabled, then compare results.
    
    Author: Bryan Cutler <cutlerb@gmail.com>
    Author: Takuya UESHIN <ueshin@databricks.com>
    
    Closes #19459 from BryanCutler/arrow-createDataFrame-from_pandas-SPARK-20791.
java_gateway.py 5.78 KiB