Skip to content
  • Xiao Li's avatar
    9a6ac722
    [SPARK-19601][SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition · 9a6ac722
    Xiao Li authored
    ### What changes were proposed in this pull request?
    
    Observed by felixcheung  in https://github.com/apache/spark/pull/16739, when users use the shuffle-enabled `repartition` API, they expect the partition they got should be the exact number they provided, even if they call shuffle-disabled `coalesce` later.
    
    Currently, `CollapseRepartition` rule does not consider whether shuffle is enabled or not. Thus, we got the following unexpected result.
    
    ```Scala
        val df = spark.range(0, 10000, 1, 5)
        val df2 = df.repartition(10)
        assert(df2.coalesce(13).rdd.getNumPartitions == 5)
        assert(df2.coalesce(7).rdd.getNumPartitions == 5)
        assert(df2.coalesce(3).rdd.getNumPartitions == 3)
    ```
    
    This PR is to fix the issue. We preserve shuffle-enabled Repartition.
    
    ### How was this patch tested?
    Added a test case
    
    Author: Xiao Li <gatorsmile@gmail.com>
    
    Closes #16933 from gatorsmile/CollapseRepartition.
    9a6ac722
    [SPARK-19601][SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition
    Xiao Li authored
    ### What changes were proposed in this pull request?
    
    Observed by felixcheung  in https://github.com/apache/spark/pull/16739, when users use the shuffle-enabled `repartition` API, they expect the partition they got should be the exact number they provided, even if they call shuffle-disabled `coalesce` later.
    
    Currently, `CollapseRepartition` rule does not consider whether shuffle is enabled or not. Thus, we got the following unexpected result.
    
    ```Scala
        val df = spark.range(0, 10000, 1, 5)
        val df2 = df.repartition(10)
        assert(df2.coalesce(13).rdd.getNumPartitions == 5)
        assert(df2.coalesce(7).rdd.getNumPartitions == 5)
        assert(df2.coalesce(3).rdd.getNumPartitions == 3)
    ```
    
    This PR is to fix the issue. We preserve shuffle-enabled Repartition.
    
    ### How was this patch tested?
    Added a test case
    
    Author: Xiao Li <gatorsmile@gmail.com>
    
    Closes #16933 from gatorsmile/CollapseRepartition.
Loading