Skip to content
Snippets Groups Projects
  • Hossein's avatar
    66f26a46
    [SPARK-2696] Reduce default value of spark.serializer.objectStreamReset · 66f26a46
    Hossein authored
    The current default value of spark.serializer.objectStreamReset is 10,000.
    When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 ~= 640 GB which will cause out of memory errors.
    
    This patch sets the default value to a more reasonable default value (100).
    
    Author: Hossein <hossein@databricks.com>
    
    Closes #1595 from falaki/objectStreamReset and squashes the following commits:
    
    650a935 [Hossein] Updated documentation
    1aa0df8 [Hossein] Reduce default value of spark.serializer.objectStreamReset
    66f26a46
    History
    [SPARK-2696] Reduce default value of spark.serializer.objectStreamReset
    Hossein authored
    The current default value of spark.serializer.objectStreamReset is 10,000.
    When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 ~= 640 GB which will cause out of memory errors.
    
    This patch sets the default value to a more reasonable default value (100).
    
    Author: Hossein <hossein@databricks.com>
    
    Closes #1595 from falaki/objectStreamReset and squashes the following commits:
    
    650a935 [Hossein] Updated documentation
    1aa0df8 [Hossein] Reduce default value of spark.serializer.objectStreamReset
configuration.md 33.97 KiB