Commit 72f36ee5 authored 10 years ago by Davies Liu Committed by Josh Rosen 10 years ago

[SPARK-3886] [PySpark] use AutoBatchedSerializer by default

Use AutoBatchedSerializer by default, which will choose the proper batch size based on size of serialized objects, let the size of serialized batch fall in into [64k - 640k].

In JVM, the serializer will also track the objects in batch to figure out duplicated objects, larger batch may cause OOM in JVM.

Author: Davies Liu <davies.liu@gmail.com>

Closes #2740 from davies/batchsize and squashes the following commits:

52cdb88 [Davies Liu] update docs
185f2b9 [Davies Liu] use AutoBatchedSerializer by default

parent 90f73fcc

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 9 additions and 6 deletions

Please register or to comment