Skip to content
Snippets Groups Projects
Commit 50a0496a authored by Brennon York's avatar Brennon York Committed by Josh Rosen
Browse files

[SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python

All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!

/cc srowen pwendell nchammas

Author: Brennon York <brennon.york@capitalone.com>

Closes #5694 from brennonyork/SPARK-7017 and squashes the following commits:

154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set
3922a85 [Brennon York] removed necessary passed in variable
f9fbe54 [Brennon York] reverted doc test change
8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated
05d435b [Brennon York] added check for jekyll install
22edb78 [Brennon York] add check if jekyll isn't installed on the path
2dff136 [Brennon York] fixed pep8 whitespace errors
767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes
c42cf9a [Brennon York] unpack set operations with splat (*)
fb85a41 [Brennon York] fixed minor set bug
0379833 [Brennon York] minor doc addition to print the changed modules
aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity
ec1ae78 [Brennon York] minor name changes, bug fixes
b7c72b9 [Brennon York] reverting streaming context
03fdd7b [Brennon York] fixed the tuple () wraps around example lambda
705d12e [Brennon York] changed example to comply with pep3113 supporting python3
60b3d51 [Brennon York] prepend rather than append onto PATH
7d2f5e2 [Brennon York] updated python tests to remove unused variable
2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests
eb684b6 [Brennon York] fixed sbt_test_goals reference error
db7ae6f [Brennon York] reverted SPARK_HOME from start of command
1ecca26 [Brennon York] fixed merge conflicts
2fcdfc0 [Brennon York] testing targte branch dump on jenkins
1f607b1 [Brennon York] finalizing revisions to modular tests
8afbe93 [Brennon York] made error codes a global
0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints
d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run
b1248dc [Brennon York] exec python rather than running python and exiting with return code
f9deba1 [Brennon York] python to python2 and removed newline
6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249
f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908
703f095 [Brennon York] fixed merge conflicts
b1ca593 [Brennon York] reverted the sparkR test
afeb093 [Brennon York] updated to make sparkR test fail
1dada6b [Brennon York] reverted pyspark test failure
9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure
d825aa4 [Brennon York] revert build break, add mima break
f041d8a [Brennon York] added space from commented import to now test build breaking
983f2a2 [Brennon York] comment out import to fail build test
2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017
76335fb [Brennon York] reverted rat license issue for sparkconf
e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes
56d3cb9 [Brennon York] changed test back and commented out import to break compile
b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment
7613558 [Brennon York] updated to return the proper env variable for return codes
a5bd445 [Brennon York] reverted license, changed test in shuffle to fail
803143a [Brennon York] removed license file for SparkContext
b0b2604 [Brennon York] comment out import to see if build fails and returns properly
83e80ef [Brennon York] attempt at better python output when called from bash
c095fa6 [Brennon York] removed another wait() call
26e18e8 [Brennon York] removed unnecessary wait()
07210a9 [Brennon York] minor doc string change for java version with namedtuple update
ec03bf3 [Brennon York] added namedtuple for java version to add readability
2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call
639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script
3c53a1a [Brennon York] uncomment the scala tests :)
6126c4f [Brennon York] refactored run-tests into python
parent 6765ef98
No related branches found
No related tags found
No related merge requests found
...@@ -17,224 +17,7 @@ ...@@ -17,224 +17,7 @@
# limitations under the License. # limitations under the License.
# #
# Go to the Spark project root directory
FWDIR="$(cd "`dirname $0`"/..; pwd)" FWDIR="$(cd "`dirname $0`"/..; pwd)"
cd "$FWDIR" cd "$FWDIR"
# Clean up work directory and caches exec python -u ./dev/run-tests.py
rm -rf ./work
rm -rf ~/.ivy2/local/org.apache.spark
rm -rf ~/.ivy2/cache/org.apache.spark
source "$FWDIR/dev/run-tests-codes.sh"
CURRENT_BLOCK=$BLOCK_GENERAL
function handle_error () {
echo "[error] Got a return code of $? on line $1 of the run-tests script."
exit $CURRENT_BLOCK
}
# Build against the right version of Hadoop.
{
if [ -n "$AMPLAB_JENKINS_BUILD_PROFILE" ]; then
if [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop1.0" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Phadoop-1 -Dhadoop.version=1.2.1"
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.0" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1"
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.2" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.2"
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.3" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0"
fi
fi
if [ -z "$SBT_MAVEN_PROFILES_ARGS" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0"
fi
}
export SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl"
# Determine Java path and version.
{
if test -x "$JAVA_HOME/bin/java"; then
declare java_cmd="$JAVA_HOME/bin/java"
else
declare java_cmd=java
fi
# We can't use sed -r -e due to OS X / BSD compatibility; hence, all the parentheses.
JAVA_VERSION=$(
$java_cmd -version 2>&1 \
| grep -e "^java version" --max-count=1 \
| sed "s/java version \"\(.*\)\.\(.*\)\.\(.*\)\"/\1\2/"
)
if [ "$JAVA_VERSION" -lt 18 ]; then
echo "[warn] Java 8 tests will not run because JDK version is < 1.8."
fi
}
# Only run Hive tests if there are SQL changes.
# Partial solution for SPARK-1455.
if [ -n "$AMPLAB_JENKINS" ]; then
target_branch="$ghprbTargetBranch"
git fetch origin "$target_branch":"$target_branch"
# AMP_JENKINS_PRB indicates if the current build is a pull request build.
if [ -n "$AMP_JENKINS_PRB" ]; then
# It is a pull request build.
sql_diffs=$(
git diff --name-only "$target_branch" \
| grep -e "^sql/" -e "^bin/spark-sql" -e "^sbin/start-thriftserver.sh"
)
non_sql_diffs=$(
git diff --name-only "$target_branch" \
| grep -v -e "^sql/" -e "^bin/spark-sql" -e "^sbin/start-thriftserver.sh"
)
if [ -n "$sql_diffs" ]; then
echo "[info] Detected changes in SQL. Will run Hive test suite."
_RUN_SQL_TESTS=true
if [ -z "$non_sql_diffs" ]; then
echo "[info] Detected no changes except in SQL. Will only run SQL tests."
_SQL_TESTS_ONLY=true
fi
fi
else
# It is a regular build. We should run SQL tests.
_RUN_SQL_TESTS=true
fi
fi
set -o pipefail
trap 'handle_error $LINENO' ERR
echo ""
echo "========================================================================="
echo "Running Apache RAT checks"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_RAT
./dev/check-license
echo ""
echo "========================================================================="
echo "Running Scala style checks"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_SCALA_STYLE
./dev/lint-scala
echo ""
echo "========================================================================="
echo "Running Python style checks"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_PYTHON_STYLE
./dev/lint-python
echo ""
echo "========================================================================="
echo "Building Spark"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_BUILD
{
HIVE_BUILD_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-thriftserver"
echo "[info] Compile with Hive 0.13.1"
[ -d "lib_managed" ] && rm -rf lib_managed
echo "[info] Building Spark with these arguments: $HIVE_BUILD_ARGS"
if [ "${AMPLAB_JENKINS_BUILD_TOOL}" == "maven" ]; then
build/mvn $HIVE_BUILD_ARGS clean package -DskipTests
else
echo -e "q\n" \
| build/sbt $HIVE_BUILD_ARGS package assembly/assembly streaming-kafka-assembly/assembly \
| grep -v -e "info.*Resolving" -e "warn.*Merging" -e "info.*Including"
fi
}
echo ""
echo "========================================================================="
echo "Detecting binary incompatibilities with MiMa"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_MIMA
./dev/mima
echo ""
echo "========================================================================="
echo "Running Spark unit tests"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_SPARK_UNIT_TESTS
{
# If the Spark SQL tests are enabled, run the tests with the Hive profiles enabled.
# This must be a single argument, as it is.
if [ -n "$_RUN_SQL_TESTS" ]; then
SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-thriftserver"
fi
if [ -n "$_SQL_TESTS_ONLY" ]; then
# This must be an array of individual arguments. Otherwise, having one long string
# will be interpreted as a single test, which doesn't work.
SBT_MAVEN_TEST_ARGS=("catalyst/test" "sql/test" "hive/test" "hive-thriftserver/test" "mllib/test")
else
SBT_MAVEN_TEST_ARGS=("test")
fi
echo "[info] Running Spark tests with these arguments: $SBT_MAVEN_PROFILES_ARGS ${SBT_MAVEN_TEST_ARGS[@]}"
if [ "${AMPLAB_JENKINS_BUILD_TOOL}" == "maven" ]; then
build/mvn test $SBT_MAVEN_PROFILES_ARGS --fail-at-end
else
# NOTE: echo "q" is needed because sbt on encountering a build file with failure
# (either resolution or compilation) prompts the user for input either q, r, etc
# to quit or retry. This echo is there to make it not block.
# NOTE: Do not quote $SBT_MAVEN_PROFILES_ARGS or else it will be interpreted as a
# single argument!
# "${SBT_MAVEN_TEST_ARGS[@]}" is cool because it's an array.
# QUESTION: Why doesn't 'yes "q"' work?
# QUESTION: Why doesn't 'grep -v -e "^\[info\] Resolving"' work?
echo -e "q\n" \
| build/sbt $SBT_MAVEN_PROFILES_ARGS "${SBT_MAVEN_TEST_ARGS[@]}" \
| grep -v -e "info.*Resolving" -e "warn.*Merging" -e "info.*Including"
fi
}
echo ""
echo "========================================================================="
echo "Running PySpark tests"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_PYSPARK_UNIT_TESTS
# add path for python 3 in jenkins
export PATH="${PATH}:/home/anaconda/envs/py3k/bin"
./python/run-tests
echo ""
echo "========================================================================="
echo "Running SparkR tests"
echo "========================================================================="
CURRENT_BLOCK=$BLOCK_SPARKR_UNIT_TESTS
if [ $(command -v R) ]; then
./R/install-dev.sh
./R/run-tests.sh
else
echo "Ignoring SparkR tests as R was not found in PATH"
fi
...@@ -21,8 +21,9 @@ readonly BLOCK_GENERAL=10 ...@@ -21,8 +21,9 @@ readonly BLOCK_GENERAL=10
readonly BLOCK_RAT=11 readonly BLOCK_RAT=11
readonly BLOCK_SCALA_STYLE=12 readonly BLOCK_SCALA_STYLE=12
readonly BLOCK_PYTHON_STYLE=13 readonly BLOCK_PYTHON_STYLE=13
readonly BLOCK_BUILD=14 readonly BLOCK_DOCUMENTATION=14
readonly BLOCK_MIMA=15 readonly BLOCK_BUILD=15
readonly BLOCK_SPARK_UNIT_TESTS=16 readonly BLOCK_MIMA=16
readonly BLOCK_PYSPARK_UNIT_TESTS=17 readonly BLOCK_SPARK_UNIT_TESTS=17
readonly BLOCK_SPARKR_UNIT_TESTS=18 readonly BLOCK_PYSPARK_UNIT_TESTS=18
readonly BLOCK_SPARKR_UNIT_TESTS=19
...@@ -210,6 +210,8 @@ done ...@@ -210,6 +210,8 @@ done
failing_test="Scala style tests" failing_test="Scala style tests"
elif [ "$test_result" -eq "$BLOCK_PYTHON_STYLE" ]; then elif [ "$test_result" -eq "$BLOCK_PYTHON_STYLE" ]; then
failing_test="Python style tests" failing_test="Python style tests"
elif [ "$test_result" -eq "$BLOCK_DOCUMENTATION" ]; then
failing_test="to generate documentation"
elif [ "$test_result" -eq "$BLOCK_BUILD" ]; then elif [ "$test_result" -eq "$BLOCK_BUILD" ]; then
failing_test="to build" failing_test="to build"
elif [ "$test_result" -eq "$BLOCK_MIMA" ]; then elif [ "$test_result" -eq "$BLOCK_MIMA" ]; then
......
This diff is collapsed.
...@@ -68,7 +68,7 @@ if __name__ == "__main__": ...@@ -68,7 +68,7 @@ if __name__ == "__main__":
closest = data.map( closest = data.map(
lambda p: (closestPoint(p, kPoints), (p, 1))) lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey( pointStats = closest.reduceByKey(
lambda (p1, c1), (p2, c2): (p1 + p2, c1 + c2)) lambda p1_c1, p2_c2: (p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map( newPoints = pointStats.map(
lambda st: (st[0], st[1][0] / st[1][1])).collect() lambda st: (st[0], st[1][0] / st[1][1])).collect()
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment