Skip to content
Snippets Groups Projects
Commit 457e58be authored by Holden Karau's avatar Holden Karau Committed by Andrew Or
Browse files

[SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no…

## What changes were proposed in this pull request?

Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package"

## How was this patch tested?

Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this.

Author: Holden Karau <holden@pigscanfly.ca>
Author: Holden Karau <holden@us.ibm.com>

Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.
parent 9af5423e
No related branches found
No related tags found
No related merge requests found
...@@ -44,7 +44,7 @@ fi ...@@ -44,7 +44,7 @@ fi
if [ ! -d "$SPARK_JARS_DIR" ] && [ -z "$SPARK_TESTING$SPARK_SQL_TESTING" ]; then if [ ! -d "$SPARK_JARS_DIR" ] && [ -z "$SPARK_TESTING$SPARK_SQL_TESTING" ]; then
echo "Failed to find Spark jars directory ($SPARK_JARS_DIR)." 1>&2 echo "Failed to find Spark jars directory ($SPARK_JARS_DIR)." 1>&2
echo "You need to build Spark before running this program." 1>&2 echo "You need to build Spark with the target \"package\" before running this program." 1>&2
exit 1 exit 1
else else
LAUNCH_CLASSPATH="$SPARK_JARS_DIR/*" LAUNCH_CLASSPATH="$SPARK_JARS_DIR/*"
......
...@@ -190,13 +190,6 @@ or ...@@ -190,13 +190,6 @@ or
Java 8 tests are automatically enabled when a Java 8 JDK is detected. Java 8 tests are automatically enabled when a Java 8 JDK is detected.
If you have JDK 8 installed but it is not the system default, you can set JAVA_HOME to point to JDK 8 before running the tests. If you have JDK 8 installed but it is not the system default, you can set JAVA_HOME to point to JDK 8 before running the tests.
# Building for PySpark on YARN
PySpark on YARN is only supported if the jar is built with Maven. Further, there is a known problem
with building this assembly jar on Red Hat based operating systems (see [SPARK-1753](https://issues.apache.org/jira/browse/SPARK-1753)). If you wish to
run PySpark on a YARN cluster with Red Hat installed, we recommend that you build the jar elsewhere,
then ship it over to the cluster. We are investigating the exact cause for this.
# Packaging without Hadoop Dependencies for YARN # Packaging without Hadoop Dependencies for YARN
The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with `yarn.application.classpath`. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself. The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with `yarn.application.classpath`. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
...@@ -210,7 +203,7 @@ compilation. More advanced developers may wish to use SBT. ...@@ -210,7 +203,7 @@ compilation. More advanced developers may wish to use SBT.
The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables
can be set to control the SBT build. For example: can be set to control the SBT build. For example:
build/sbt -Pyarn -Phadoop-2.3 assembly build/sbt -Pyarn -Phadoop-2.3 package
To avoid the overhead of launching sbt each time you need to re-compile, you can launch sbt To avoid the overhead of launching sbt each time you need to re-compile, you can launch sbt
in interactive mode by running `build/sbt`, and then run all build commands at the command in interactive mode by running `build/sbt`, and then run all build commands at the command
...@@ -219,9 +212,9 @@ prompt. For more recommendations on reducing build time, refer to the ...@@ -219,9 +212,9 @@ prompt. For more recommendations on reducing build time, refer to the
# Testing with SBT # Testing with SBT
Some of the tests require Spark to be packaged first, so always run `build/sbt assembly` the first time. The following is an example of a correct (build, test) sequence: Some of the tests require Spark to be packaged first, so always run `build/sbt package` the first time. The following is an example of a correct (build, test) sequence:
build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver assembly build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver package
build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
To run only a specific test suite as follows: To run only a specific test suite as follows:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment