Skip to content
Snippets Groups Projects
  • Davies Liu's avatar
    d7e80c25
    [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes. · d7e80c25
    Davies Liu authored
    If two RDDs have different batch size in serializers, then it will try to re-serialize the one with smaller batch size, then call RDD.zip() in Spark.
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #1894 from davies/zip and squashes the following commits:
    
    c4652ea [Davies Liu] add more test cases
    6d05fc8 [Davies Liu] Merge branch 'master' into zip
    813b1e4 [Davies Liu] add more tests for failed cases
    a4aafda [Davies Liu] fix zip with serializers which have different batch sizes.
    d7e80c25
    History
    [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes.
    Davies Liu authored
    If two RDDs have different batch size in serializers, then it will try to re-serialize the one with smaller batch size, then call RDD.zip() in Spark.
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #1894 from davies/zip and squashes the following commits:
    
    c4652ea [Davies Liu] add more test cases
    6d05fc8 [Davies Liu] Merge branch 'master' into zip
    813b1e4 [Davies Liu] add more tests for failed cases
    a4aafda [Davies Liu] fix zip with serializers which have different batch sizes.
tests.py 43.96 KiB