Skip to content
  • Davies Liu's avatar
    db436e36
    [SPARK-2871] [PySpark] add `key` argument for max(), min() and top(n) · db436e36
    Davies Liu authored
    RDD.max(key=None)
    
            param key: A function used to generate key for comparing
    
            >>> rdd = sc.parallelize([1.0, 5.0, 43.0, 10.0])
            >>> rdd.max()
            43.0
            >>> rdd.max(key=str)
            5.0
    
    RDD.min(key=None)
    
            Find the minimum item in this RDD.
    
            param key: A function used to generate key for comparing
    
            >>> rdd = sc.parallelize([2.0, 5.0, 43.0, 10.0])
            >>> rdd.min()
            2.0
            >>> rdd.min(key=str)
            10.0
    
    RDD.top(num, key=None)
    
            Get the top N elements from a RDD.
    
            Note: It returns the list sorted in descending order.
            >>> sc.parallelize([10, 4, 2, 12, 3]).top(1)
            [12]
            >>> sc.parallelize([2, 3, 4, 5, 6], 2).top(2)
            [6, 5]
            >>> sc.parallelize([10, 4, 2, 12, 3]).top(3, key=str)
            [4, 3, 2]
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #2094 from davies/cmp and squashes the following commits:
    
    ccbaf25 [Davies Liu] add `key` to top()
    ad7e374 [Davies Liu] fix tests
    2f63512 [Davies Liu] change `comp` to `key` in min/max
    dd91e08 [Davies Liu] add `comp` argument for RDD.max() and RDD.min()
    db436e36
    [SPARK-2871] [PySpark] add `key` argument for max(), min() and top(n)
    Davies Liu authored
    RDD.max(key=None)
    
            param key: A function used to generate key for comparing
    
            >>> rdd = sc.parallelize([1.0, 5.0, 43.0, 10.0])
            >>> rdd.max()
            43.0
            >>> rdd.max(key=str)
            5.0
    
    RDD.min(key=None)
    
            Find the minimum item in this RDD.
    
            param key: A function used to generate key for comparing
    
            >>> rdd = sc.parallelize([2.0, 5.0, 43.0, 10.0])
            >>> rdd.min()
            2.0
            >>> rdd.min(key=str)
            10.0
    
    RDD.top(num, key=None)
    
            Get the top N elements from a RDD.
    
            Note: It returns the list sorted in descending order.
            >>> sc.parallelize([10, 4, 2, 12, 3]).top(1)
            [12]
            >>> sc.parallelize([2, 3, 4, 5, 6], 2).top(2)
            [6, 5]
            >>> sc.parallelize([10, 4, 2, 12, 3]).top(3, key=str)
            [4, 3, 2]
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #2094 from davies/cmp and squashes the following commits:
    
    ccbaf25 [Davies Liu] add `key` to top()
    ad7e374 [Davies Liu] fix tests
    2f63512 [Davies Liu] change `comp` to `key` in min/max
    dd91e08 [Davies Liu] add `comp` argument for RDD.max() and RDD.min()
Loading