Skip to content
Snippets Groups Projects
  • cocoatomo's avatar
    5b4a5b1a
    [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset · 5b4a5b1a
    cocoatomo authored
    ### Problem
    
    The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython.
    But a folloing command does not run IPython but a default Python executable.
    
    ```
    $ IPYTHON=1 ./bin/pyspark
    Python 2.7.8 (default, Jul  2 2014, 10:14:46)
    ...
    ```
    
    the spark/bin/pyspark script on the commit b235e013 decides which executable and options it use folloing way.
    
    1. if PYSPARK_PYTHON unset
       * → defaulting to "python"
    2. if IPYTHON_OPTS set
       * → set IPYTHON "1"
    3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
       * out of this issues scope
    4. if IPYTHON set as "1"
       * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
       * otherwise execute $PYSPARK_PYTHON
    
    Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1".
    In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use.
    
    PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command
    ---- | ---- | ----- | ----- | -----
    (unset → defaults to python) | (unset) | (unset) | python | (same)
    (unset → defaults to python) | (unset) | 1 | python | ipython
    (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option
    (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option
    ipython | (unset) | (unset) | ipython | (same)
    ipython | (unset) | 1 | ipython | (same)
    ipython | an_option | (unset → set to 1) | ipython an_option | (same)
    ipython | an_option | 1 | ipython an_option | (same)
    
    ### Suggestion
    
    The pyspark script should determine firstly whether a user wants to run IPython or other executables.
    
    1. if IPYTHON_OPTS set
       * set IPYTHON "1"
    2.  if IPYTHON has a value "1"
       * PYSPARK_PYTHON defaults to "ipython" if not set
    3. PYSPARK_PYTHON defaults to "python" if not set
    
    See the pull request for more detailed modification.
    
    Author: cocoatomo <cocoatomo77@gmail.com>
    
    Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits:
    
    d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option
    264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables
    42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL
    10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
    5b4a5b1a
    History
    [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
    cocoatomo authored
    ### Problem
    
    The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython.
    But a folloing command does not run IPython but a default Python executable.
    
    ```
    $ IPYTHON=1 ./bin/pyspark
    Python 2.7.8 (default, Jul  2 2014, 10:14:46)
    ...
    ```
    
    the spark/bin/pyspark script on the commit b235e013 decides which executable and options it use folloing way.
    
    1. if PYSPARK_PYTHON unset
       * → defaulting to "python"
    2. if IPYTHON_OPTS set
       * → set IPYTHON "1"
    3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
       * out of this issues scope
    4. if IPYTHON set as "1"
       * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
       * otherwise execute $PYSPARK_PYTHON
    
    Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1".
    In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use.
    
    PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command
    ---- | ---- | ----- | ----- | -----
    (unset → defaults to python) | (unset) | (unset) | python | (same)
    (unset → defaults to python) | (unset) | 1 | python | ipython
    (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option
    (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option
    ipython | (unset) | (unset) | ipython | (same)
    ipython | (unset) | 1 | ipython | (same)
    ipython | an_option | (unset → set to 1) | ipython an_option | (same)
    ipython | an_option | 1 | ipython an_option | (same)
    
    ### Suggestion
    
    The pyspark script should determine firstly whether a user wants to run IPython or other executables.
    
    1. if IPYTHON_OPTS set
       * set IPYTHON "1"
    2.  if IPYTHON has a value "1"
       * PYSPARK_PYTHON defaults to "ipython" if not set
    3. PYSPARK_PYTHON defaults to "python" if not set
    
    See the pull request for more detailed modification.
    
    Author: cocoatomo <cocoatomo77@gmail.com>
    
    Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits:
    
    d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option
    264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables
    42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL
    10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset