Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    8141c3e3
    [SPARK-23300][TESTS] Prints out if Pandas and PyArrow are installed or not in PySpark SQL tests · 8141c3e3
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to log if PyArrow and Pandas are installed or not so we can check if related tests are going to be skipped or not.
    
    ## How was this patch tested?
    
    Manually tested:
    
    I don't have PyArrow installed in PyPy.
    ```bash
    $ ./run-tests --python-executables=python3
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python3']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python3' in 'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python3' in 'pyspark-sql' module.
    Starting test(python3): pyspark.mllib.tests
    Starting test(python3): pyspark.sql.tests
    Starting test(python3): pyspark.streaming.tests
    Starting test(python3): pyspark.tests
    ```
    
    ```bash
    $ ./run-tests --modules=pyspark-streaming
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-streaming']
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.streaming.util
    Starting test(python2.7): pyspark.streaming.tests
    Starting test(python2.7): pyspark.streaming.util
    ```
    
    ```bash
    $ ./run-tests
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not found.
    Will test Pandas related features against Python executable 'pypy' in 'pyspark-sql' module.
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.tests
    Starting test(python2.7): pyspark.mllib.tests
    ```
    
    ```bash
    $ ./run-tests --modules=pyspark-sql --python-executables=pypy
    ```
    
    ```
    ...
    Will test against the following Python executables: ['pypy']
    Will test the following Python modules: ['pyspark-sql']
    Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not found.
    Will test Pandas related features against Python executable 'pypy' in 'pyspark-sql' module.
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.sql.catalog
    Starting test(pypy): pyspark.sql.column
    Starting test(pypy): pyspark.sql.conf
    ```
    
    After some modification to produce other cases:
    
    ```bash
    $ ./run-tests
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will skip PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module. PyArrow >= 20.0.0 is required; however, PyArrow 0.8.0 was found.
    Will skip Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module. Pandas >= 20.0.0 is required; however, Pandas 0.20.2 was found.
    Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 20.0.0 is required; however, PyArrow was not found.
    Will skip Pandas related features against Python executable 'pypy' in 'pyspark-sql' module. Pandas >= 20.0.0 is required; however, Pandas 0.22.0 was found.
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.tests
    Starting test(python2.7): pyspark.mllib.tests
    ```
    
    ```bash
    ./run-tests-with-coverage
    ```
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Coverage is not installed in Python executable 'pypy' but 'COVERAGE_PROCESS_START' environment variable is set, exiting.
    ```
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #20473 from HyukjinKwon/SPARK-23300.
    8141c3e3
    History
    [SPARK-23300][TESTS] Prints out if Pandas and PyArrow are installed or not in PySpark SQL tests
    hyukjinkwon authored
    ## What changes were proposed in this pull request?
    
    This PR proposes to log if PyArrow and Pandas are installed or not so we can check if related tests are going to be skipped or not.
    
    ## How was this patch tested?
    
    Manually tested:
    
    I don't have PyArrow installed in PyPy.
    ```bash
    $ ./run-tests --python-executables=python3
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python3']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python3' in 'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python3' in 'pyspark-sql' module.
    Starting test(python3): pyspark.mllib.tests
    Starting test(python3): pyspark.sql.tests
    Starting test(python3): pyspark.streaming.tests
    Starting test(python3): pyspark.tests
    ```
    
    ```bash
    $ ./run-tests --modules=pyspark-streaming
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-streaming']
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.streaming.util
    Starting test(python2.7): pyspark.streaming.tests
    Starting test(python2.7): pyspark.streaming.util
    ```
    
    ```bash
    $ ./run-tests
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not found.
    Will test Pandas related features against Python executable 'pypy' in 'pyspark-sql' module.
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.tests
    Starting test(python2.7): pyspark.mllib.tests
    ```
    
    ```bash
    $ ./run-tests --modules=pyspark-sql --python-executables=pypy
    ```
    
    ```
    ...
    Will test against the following Python executables: ['pypy']
    Will test the following Python modules: ['pyspark-sql']
    Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not found.
    Will test Pandas related features against Python executable 'pypy' in 'pyspark-sql' module.
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.sql.catalog
    Starting test(pypy): pyspark.sql.column
    Starting test(pypy): pyspark.sql.conf
    ```
    
    After some modification to produce other cases:
    
    ```bash
    $ ./run-tests
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will skip PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module. PyArrow >= 20.0.0 is required; however, PyArrow 0.8.0 was found.
    Will skip Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module. Pandas >= 20.0.0 is required; however, Pandas 0.20.2 was found.
    Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 20.0.0 is required; however, PyArrow was not found.
    Will skip Pandas related features against Python executable 'pypy' in 'pyspark-sql' module. Pandas >= 20.0.0 is required; however, Pandas 0.22.0 was found.
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.tests
    Starting test(python2.7): pyspark.mllib.tests
    ```
    
    ```bash
    ./run-tests-with-coverage
    ```
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module.
    Coverage is not installed in Python executable 'pypy' but 'COVERAGE_PROCESS_START' environment variable is set, exiting.
    ```
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #20473 from HyukjinKwon/SPARK-23300.