Skip to content
Snippets Groups Projects
  • Xiangrui Meng's avatar
    3cca1962
    [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample · 3cca1962
    Xiangrui Meng authored
    The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1.
    
    ~~~
    In [14]: import random
    
    In [15]: r1 = random.Random(10)
    
    In [16]: r1.randint(0, 1)
    Out[16]: 1
    
    In [17]: r1.random()
    Out[17]: 0.4288890546751146
    
    In [18]: r1.random()
    Out[18]: 0.5780913011344704
    
    In [19]: r2 = random.Random(10)
    
    In [20]: r2.randint(0, 1)
    Out[20]: 1
    
    In [21]: r2.randint(0, 1)
    Out[21]: 0
    
    In [22]: r2.random()
    Out[22]: 0.5780913011344704
    ~~~
    
    Note: The new tests are not for this bug fix.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3010 from mengxr/SPARK-4148 and squashes the following commits:
    
    869ae4b [Xiangrui Meng] move tests tests.py
    c1bacd9 [Xiangrui Meng] fix seed distribution and add some tests for rdd.sample
    3cca1962
    History
    [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample
    Xiangrui Meng authored
    The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1.
    
    ~~~
    In [14]: import random
    
    In [15]: r1 = random.Random(10)
    
    In [16]: r1.randint(0, 1)
    Out[16]: 1
    
    In [17]: r1.random()
    Out[17]: 0.4288890546751146
    
    In [18]: r1.random()
    Out[18]: 0.5780913011344704
    
    In [19]: r2 = random.Random(10)
    
    In [20]: r2.randint(0, 1)
    Out[20]: 1
    
    In [21]: r2.randint(0, 1)
    Out[21]: 0
    
    In [22]: r2.random()
    Out[22]: 0.5780913011344704
    ~~~
    
    Note: The new tests are not for this bug fix.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3010 from mengxr/SPARK-4148 and squashes the following commits:
    
    869ae4b [Xiangrui Meng] move tests tests.py
    c1bacd9 [Xiangrui Meng] fix seed distribution and add some tests for rdd.sample