Skip to content
  • Davies Liu's avatar
    7f22fa81
    [SPARK-4327] [PySpark] Python API for RDD.randomSplit() · 7f22fa81
    Davies Liu authored
    ```
    pyspark.RDD.randomSplit(self, weights, seed=None)
        Randomly splits this RDD with the provided weights.
    
        :param weights: weights for splits, will be normalized if they don't sum to 1
        :param seed: random seed
        :return: split RDDs in an list
    
        >>> rdd = sc.parallelize(range(10), 1)
        >>> rdd1, rdd2, rdd3 = rdd.randomSplit([0.4, 0.6, 1.0], 11)
        >>> rdd1.collect()
        [3, 6]
        >>> rdd2.collect()
        [0, 5, 7]
        >>> rdd3.collect()
        [1, 2, 4, 8, 9]
    ```
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3193 from davies/randomSplit and squashes the following commits:
    
    78bf997 [Davies Liu] fix tests, do not use numpy in randomSplit, no performance gain
    f5fdf63 [Davies Liu] fix bug with int in weights
    4dfa2cd [Davies Liu] refactor
    f866bcf [Davies Liu] remove unneeded change
    c7a2007 [Davies Liu] switch to python implementation
    95a48ac [Davies Liu] Merge branch 'master' of github.com:apache/spark into randomSplit
    0d9b256 [Davies Liu] refactor
    1715ee3 [Davies Liu] address comments
    41fce54 [Davies Liu] randomSplit()
    7f22fa81
    [SPARK-4327] [PySpark] Python API for RDD.randomSplit()
    Davies Liu authored
    ```
    pyspark.RDD.randomSplit(self, weights, seed=None)
        Randomly splits this RDD with the provided weights.
    
        :param weights: weights for splits, will be normalized if they don't sum to 1
        :param seed: random seed
        :return: split RDDs in an list
    
        >>> rdd = sc.parallelize(range(10), 1)
        >>> rdd1, rdd2, rdd3 = rdd.randomSplit([0.4, 0.6, 1.0], 11)
        >>> rdd1.collect()
        [3, 6]
        >>> rdd2.collect()
        [0, 5, 7]
        >>> rdd3.collect()
        [1, 2, 4, 8, 9]
    ```
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3193 from davies/randomSplit and squashes the following commits:
    
    78bf997 [Davies Liu] fix tests, do not use numpy in randomSplit, no performance gain
    f5fdf63 [Davies Liu] fix bug with int in weights
    4dfa2cd [Davies Liu] refactor
    f866bcf [Davies Liu] remove unneeded change
    c7a2007 [Davies Liu] switch to python implementation
    95a48ac [Davies Liu] Merge branch 'master' of github.com:apache/spark into randomSplit
    0d9b256 [Davies Liu] refactor
    1715ee3 [Davies Liu] address comments
    41fce54 [Davies Liu] randomSplit()
Loading