Skip to content
Snippets Groups Projects
  • Shuai Lin's avatar
    bd9a4a5a
    [SPARK-18652][PYTHON] Include the example data and third-party licenses in pyspark package. · bd9a4a5a
    Shuai Lin authored
    ## What changes were proposed in this pull request?
    
    Since we already include the python examples in the pyspark package, we should include the example data with it as well.
    
    We should also include the third-party licences since we distribute their jars with the pyspark package.
    
    ## How was this patch tested?
    
    Manually tested with python2.7 and python3.4
    ```sh
    $ ./build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Pmesos clean package
    $ cd python
    $ python setup.py sdist
    $ pip install  dist/pyspark-2.1.0.dev0.tar.gz
    
    $ ls -1 /usr/local/lib/python2.7/dist-packages/pyspark/data/
    graphx
    mllib
    streaming
    
    $ du -sh /usr/local/lib/python2.7/dist-packages/pyspark/data/
    600K    /usr/local/lib/python2.7/dist-packages/pyspark/data/
    
    $ ls -1  /usr/local/lib/python2.7/dist-packages/pyspark/licenses/|head -5
    LICENSE-AnchorJS.txt
    LICENSE-DPark.txt
    LICENSE-Mockito.txt
    LICENSE-SnapTree.txt
    LICENSE-antlr.txt
    ```
    
    Author: Shuai Lin <linshuai2012@gmail.com>
    
    Closes #16082 from lins05/include-data-in-pyspark-dist.
    [SPARK-18652][PYTHON] Include the example data and third-party licenses in pyspark package.
    Shuai Lin authored
    ## What changes were proposed in this pull request?
    
    Since we already include the python examples in the pyspark package, we should include the example data with it as well.
    
    We should also include the third-party licences since we distribute their jars with the pyspark package.
    
    ## How was this patch tested?
    
    Manually tested with python2.7 and python3.4
    ```sh
    $ ./build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Pmesos clean package
    $ cd python
    $ python setup.py sdist
    $ pip install  dist/pyspark-2.1.0.dev0.tar.gz
    
    $ ls -1 /usr/local/lib/python2.7/dist-packages/pyspark/data/
    graphx
    mllib
    streaming
    
    $ du -sh /usr/local/lib/python2.7/dist-packages/pyspark/data/
    600K    /usr/local/lib/python2.7/dist-packages/pyspark/data/
    
    $ ls -1  /usr/local/lib/python2.7/dist-packages/pyspark/licenses/|head -5
    LICENSE-AnchorJS.txt
    LICENSE-DPark.txt
    LICENSE-Mockito.txt
    LICENSE-SnapTree.txt
    LICENSE-antlr.txt
    ```
    
    Author: Shuai Lin <linshuai2012@gmail.com>
    
    Closes #16082 from lins05/include-data-in-pyspark-dist.