Skip to content
Snippets Groups Projects
Commit a5c10ff2 authored by Holden Karau's avatar Holden Karau
Browse files

[SPARK-19064][PYSPARK] Fix pip installing of sub components


## What changes were proposed in this pull request?

Fix instalation of mllib and ml sub components, and more eagerly cleanup cache files during test script & make-distribution.

## How was this patch tested?

Updated sanity test script to import mllib and ml sub-components.

Author: Holden Karau <holden@us.ibm.com>

Closes #16465 from holdenk/SPARK-19064-fix-pip-install-sub-components.

(cherry picked from commit 965c82d8)
Signed-off-by: default avatarHolden Karau <holden@us.ibm.com>
parent 97d3353e
No related branches found
No related tags found
No related merge requests found
...@@ -213,6 +213,8 @@ cp -r "$SPARK_HOME/data" "$DISTDIR" ...@@ -213,6 +213,8 @@ cp -r "$SPARK_HOME/data" "$DISTDIR"
if [ "$MAKE_PIP" == "true" ]; then if [ "$MAKE_PIP" == "true" ]; then
echo "Building python distribution package" echo "Building python distribution package"
pushd "$SPARK_HOME/python" > /dev/null pushd "$SPARK_HOME/python" > /dev/null
# Delete the egg info file if it exists, this can cache older setup files.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
python setup.py sdist python setup.py sdist
popd > /dev/null popd > /dev/null
else else
......
...@@ -18,6 +18,8 @@ ...@@ -18,6 +18,8 @@
from __future__ import print_function from __future__ import print_function
from pyspark.sql import SparkSession from pyspark.sql import SparkSession
from pyspark.ml.param import Params
from pyspark.mllib.linalg import *
import sys import sys
if __name__ == "__main__": if __name__ == "__main__":
......
jira==1.0.3 jira==1.0.3
PyGithub==1.26.0 PyGithub==1.26.0
Unidecode==0.04.19 Unidecode==0.04.19
pypandoc==1.3.3
...@@ -78,11 +78,14 @@ for python in "${PYTHON_EXECS[@]}"; do ...@@ -78,11 +78,14 @@ for python in "${PYTHON_EXECS[@]}"; do
mkdir -p "$VIRTUALENV_PATH" mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH" virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate source "$VIRTUALENV_PATH"/bin/activate
# Upgrade pip # Upgrade pip & friends
pip install --upgrade pip pip install --upgrade pip pypandoc wheel
pip install numpy # Needed so we can verify mllib imports
echo "Creating pip installable source dist" echo "Creating pip installable source dist"
cd "$FWDIR"/python cd "$FWDIR"/python
# Delete the egg info file if it exists, this can cache the setup file.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
$python setup.py sdist $python setup.py sdist
......
...@@ -162,7 +162,12 @@ try: ...@@ -162,7 +162,12 @@ try:
url='https://github.com/apache/spark/tree/master/python', url='https://github.com/apache/spark/tree/master/python',
packages=['pyspark', packages=['pyspark',
'pyspark.mllib', 'pyspark.mllib',
'pyspark.mllib.linalg',
'pyspark.mllib.stat',
'pyspark.ml', 'pyspark.ml',
'pyspark.ml.linalg',
'pyspark.ml.param',
'pyspark.ml.stat',
'pyspark.sql', 'pyspark.sql',
'pyspark.streaming', 'pyspark.streaming',
'pyspark.bin', 'pyspark.bin',
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment