[SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark

## What changes were proposed in this pull request? At example of repartitionAndSortWithinPartitions at rdd.py, third argument should be True or False. I proposed fix of example code. ## How was this patch tested? * I rename test_repartitionAndSortWithinPartitions to test_repartitionAndSortWIthinPartitions_asc to specify boolean argument. * I added test_repartitionAndSortWithinPartitions_desc to test False pattern at third argument. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: chie8842 <chie8842@gmail.com> Closes #18586 from chie8842/SPARK-21358.

[SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark
c3713fde · chie8842 · Reynold Xin · d03aebbe · c3713fde · c3713fde
Commit c3713fde authored 7 years ago by chie8842 Committed by Reynold Xin 7 years ago
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -608,7 +608,7 @@ class RDD(object):
        sort records by their keys.

        >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)])
-        >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2)
+        >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True)
        >>> rdd2.glom().collect()
        [[(0, 5), (0, 8), (2, 6)], [(1, 3), (3, 8), (3, 8)]]
        """

--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1019,14 +1019,22 @@ class RDDTests(ReusedPySparkTestCase):
        self.assertEqual((["ab", "ef"], [5]), rdd.histogram(1))
        self.assertRaises(TypeError, lambda: rdd.histogram(2))

-    def test_repartitionAndSortWithinPartitions(self):
+    def test_repartitionAndSortWithinPartitions_asc(self):
        rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)], 2)

-        repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: key % 2)
+        repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: key % 2, True)
        partitions = repartitioned.glom().collect()
        self.assertEqual(partitions[0], [(0, 5), (0, 8), (2, 6)])
        self.assertEqual(partitions[1], [(1, 3), (3, 8), (3, 8)])

+    def test_repartitionAndSortWithinPartitions_desc(self):
+        rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)], 2)
+
+        repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: key % 2, False)
+        partitions = repartitioned.glom().collect()
+        self.assertEqual(partitions[0], [(2, 6), (0, 5), (0, 8)])
+        self.assertEqual(partitions[1], [(3, 8), (3, 8), (1, 3)])
+
    def test_repartition_no_skewed(self):
        num_partitions = 20
        a = self.sc.parallelize(range(int(1000)), 2)