Skip to content
Snippets Groups Projects
  • Davies Liu's avatar
    72f36ee5
    [SPARK-3886] [PySpark] use AutoBatchedSerializer by default · 72f36ee5
    Davies Liu authored
    Use AutoBatchedSerializer by default, which will choose the proper batch size based on size of serialized objects, let the size of serialized batch fall in into  [64k - 640k].
    
    In JVM, the serializer will also track the objects in batch to figure out duplicated objects, larger batch may cause OOM in JVM.
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #2740 from davies/batchsize and squashes the following commits:
    
    52cdb88 [Davies Liu] update docs
    185f2b9 [Davies Liu] use AutoBatchedSerializer by default
    72f36ee5
    History
    [SPARK-3886] [PySpark] use AutoBatchedSerializer by default
    Davies Liu authored
    Use AutoBatchedSerializer by default, which will choose the proper batch size based on size of serialized objects, let the size of serialized batch fall in into  [64k - 640k].
    
    In JVM, the serializer will also track the objects in batch to figure out duplicated objects, larger batch may cause OOM in JVM.
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #2740 from davies/batchsize and squashes the following commits:
    
    52cdb88 [Davies Liu] update docs
    185f2b9 [Davies Liu] use AutoBatchedSerializer by default