Skip to content
Snippets Groups Projects
  • Jeff Zhang's avatar
    5b77e66d
    [SPARK-17387][PYSPARK] Creating SparkContext() from python without spark-submit ignores user conf · 5b77e66d
    Jeff Zhang authored
    ## What changes were proposed in this pull request?
    
    The root cause that we would ignore SparkConf when launching JVM is that SparkConf require JVM to be created first.  https://github.com/apache/spark/blob/master/python/pyspark/conf.py#L106
    In this PR, I would defer the launching of JVM until SparkContext is created so that we can pass SparkConf to JVM correctly.
    
    ## How was this patch tested?
    
    Use the example code in the description of SPARK-17387,
    ```
    $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python
    Python 2.7.12 (default, Jul  1 2016, 15:12:24)
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from pyspark import SparkContext
    >>> from pyspark import SparkConf
    >>> conf = SparkConf().set("spark.driver.memory", "4g")
    >>> sc = SparkContext(conf=conf)
    ```
    And verify the spark.driver.memory is correctly picked up.
    
    ```
    ...op/ -Xmx4g org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=4g pyspark-shell
    ```
    
    Author: Jeff Zhang <zjffdu@apache.org>
    
    Closes #14959 from zjffdu/SPARK-17387.
    5b77e66d
    History
    [SPARK-17387][PYSPARK] Creating SparkContext() from python without spark-submit ignores user conf
    Jeff Zhang authored
    ## What changes were proposed in this pull request?
    
    The root cause that we would ignore SparkConf when launching JVM is that SparkConf require JVM to be created first.  https://github.com/apache/spark/blob/master/python/pyspark/conf.py#L106
    In this PR, I would defer the launching of JVM until SparkContext is created so that we can pass SparkConf to JVM correctly.
    
    ## How was this patch tested?
    
    Use the example code in the description of SPARK-17387,
    ```
    $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python
    Python 2.7.12 (default, Jul  1 2016, 15:12:24)
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from pyspark import SparkContext
    >>> from pyspark import SparkConf
    >>> conf = SparkConf().set("spark.driver.memory", "4g")
    >>> sc = SparkContext(conf=conf)
    ```
    And verify the spark.driver.memory is correctly picked up.
    
    ```
    ...op/ -Xmx4g org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=4g pyspark-shell
    ```
    
    Author: Jeff Zhang <zjffdu@apache.org>
    
    Closes #14959 from zjffdu/SPARK-17387.