Skip to content
Snippets Groups Projects
Commit 75663b57 authored by Davies Liu's avatar Davies Liu Committed by Matei Zaharia
Browse files

[SPARK-2652] [PySpark] Turning some default configs for PySpark

Add several default configs for PySpark, related to serialization in JVM.

spark.serializer = org.apache.spark.serializer.KryoSerializer
spark.serializer.objectStreamReset = 100
spark.rdd.compress = True

This will help to reduce the memory usage during RDD.partitionBy()

Author: Davies Liu <davies.liu@gmail.com>

Closes #1568 from davies/conf and squashes the following commits:

cd316f1 [Davies Liu] remove duplicated line
f71a355 [Davies Liu] rebase to master, add spark.rdd.compress = True
8f63f45 [Davies Liu] Merge branch 'master' into conf
8bc9f08 [Davies Liu] fix unittest
c04a83d [Davies Liu] some default configs for PySpark
parent 66f26a46
No related branches found
No related tags found
No related merge requests found
...@@ -37,6 +37,15 @@ from pyspark.rdd import RDD ...@@ -37,6 +37,15 @@ from pyspark.rdd import RDD
from py4j.java_collections import ListConverter from py4j.java_collections import ListConverter
# These are special default configs for PySpark, they will overwrite
# the default ones for Spark if they are not configured by user.
DEFAULT_CONFIGS = {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.serializer.objectStreamReset": 100,
"spark.rdd.compress": True,
}
class SparkContext(object): class SparkContext(object):
""" """
Main entry point for Spark functionality. A SparkContext represents the Main entry point for Spark functionality. A SparkContext represents the
...@@ -101,7 +110,7 @@ class SparkContext(object): ...@@ -101,7 +110,7 @@ class SparkContext(object):
else: else:
self.serializer = BatchedSerializer(self._unbatched_serializer, self.serializer = BatchedSerializer(self._unbatched_serializer,
batchSize) batchSize)
self._conf.setIfMissing("spark.rdd.compress", "true")
# Set any parameters passed directly to us on the conf # Set any parameters passed directly to us on the conf
if master: if master:
self._conf.setMaster(master) self._conf.setMaster(master)
...@@ -112,6 +121,8 @@ class SparkContext(object): ...@@ -112,6 +121,8 @@ class SparkContext(object):
if environment: if environment:
for key, value in environment.iteritems(): for key, value in environment.iteritems():
self._conf.setExecutorEnv(key, value) self._conf.setExecutorEnv(key, value)
for key, value in DEFAULT_CONFIGS.items():
self._conf.setIfMissing(key, value)
# Check that we have at least the required parameters # Check that we have at least the required parameters
if not self._conf.contains("spark.master"): if not self._conf.contains("spark.master"):
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment