Skip to content
Snippets Groups Projects
Commit 965c82d8 authored by Holden Karau's avatar Holden Karau
Browse files

[SPARK-19064][PYSPARK] Fix pip installing of sub components

## What changes were proposed in this pull request?

Fix instalation of mllib and ml sub components, and more eagerly cleanup cache files during test script & make-distribution.

## How was this patch tested?

Updated sanity test script to import mllib and ml sub-components.

Author: Holden Karau <holden@us.ibm.com>

Closes #16465 from holdenk/SPARK-19064-fix-pip-install-sub-components.
parent 92afaa93
No related branches found
No related tags found
No related merge requests found
......@@ -220,6 +220,8 @@ cp -r "$SPARK_HOME/data" "$DISTDIR"
if [ "$MAKE_PIP" == "true" ]; then
echo "Building python distribution package"
pushd "$SPARK_HOME/python" > /dev/null
# Delete the egg info file if it exists, this can cache older setup files.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
python setup.py sdist
popd > /dev/null
else
......
......@@ -18,6 +18,8 @@
from __future__ import print_function
from pyspark.sql import SparkSession
from pyspark.ml.param import Params
from pyspark.mllib.linalg import *
import sys
if __name__ == "__main__":
......
jira==1.0.3
PyGithub==1.26.0
Unidecode==0.04.19
pypandoc==1.3.3
......@@ -78,11 +78,14 @@ for python in "${PYTHON_EXECS[@]}"; do
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
# Upgrade pip
pip install --upgrade pip
# Upgrade pip & friends
pip install --upgrade pip pypandoc wheel
pip install numpy # Needed so we can verify mllib imports
echo "Creating pip installable source dist"
cd "$FWDIR"/python
# Delete the egg info file if it exists, this can cache the setup file.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
$python setup.py sdist
......
......@@ -162,7 +162,12 @@ try:
url='https://github.com/apache/spark/tree/master/python',
packages=['pyspark',
'pyspark.mllib',
'pyspark.mllib.linalg',
'pyspark.mllib.stat',
'pyspark.ml',
'pyspark.ml.linalg',
'pyspark.ml.param',
'pyspark.ml.stat',
'pyspark.sql',
'pyspark.streaming',
'pyspark.bin',
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment