Skip to content
Snippets Groups Projects
  • Holden Karau's avatar
    457e58be
    [SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no… · 457e58be
    Holden Karau authored
    ## What changes were proposed in this pull request?
    
    Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package"
    
    ## How was this patch tested?
    
    Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this.
    
    Author: Holden Karau <holden@pigscanfly.ca>
    Author: Holden Karau <holden@us.ibm.com>
    
    Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.
    457e58be
    History
    [SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no…
    Holden Karau authored
    ## What changes were proposed in this pull request?
    
    Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package"
    
    ## How was this patch tested?
    
    Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this.
    
    Author: Holden Karau <holden@pigscanfly.ca>
    Author: Holden Karau <holden@us.ibm.com>
    
    Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.
building-spark.md 10.72 KiB
layout: global
title: Building Spark
redirect_from: "building-with-maven.html"
  • This will become a table of contents (this text will be scraped). {:toc}

Building Spark using Maven requires Maven 3.3.9 or newer and Java 7+. The Spark build can supply a suitable Maven binary; see below.

Building with build/mvn

Spark now comes packaged with a self-contained Maven installation to ease building and deployment of Spark from source located under the build/ directory. This script will automatically download and setup all necessary build requirements (Maven, Scala, and Zinc) locally within the build/ directory itself. It honors any mvn binary if present already, however, will pull down its own copy of Scala and Zinc regardless to ensure proper version requirements are met. build/mvn execution acts as a pass through to the mvn call allowing easy transition from previous build methods. As an example, one can build a version of Spark as follows:

{% highlight bash %} build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package {% endhighlight %}

Other build examples can be found below.

Note: When building on an encrypted filesystem (if your home directory is encrypted, for example), then the Spark build might fail with a "Filename too long" error. As a workaround, add the following in the configuration args of the scala-maven-plugin in the project pom.xml:

<arg>-Xmax-classfile-name</arg>
<arg>128</arg>

and in project/SparkBuild.scala add:

scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),

to the sharedSettings val. See also this PR if you are unsure of where to add these lines.

Building a Runnable Distribution

To create a Spark distribution like those distributed by the Spark Downloads page, and that is laid out so as to be runnable, use ./dev/make-distribution.sh in the project root directory. It can be configured with Maven profile settings and so on like the direct Maven build. Example:

./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn

For more information on usage, run ./dev/make-distribution.sh --help

Setting up Maven's Memory Usage

You'll need to configure Maven to use more memory than usual by setting MAVEN_OPTS. We recommend the following settings:

{% highlight bash %} export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" {% endhighlight %}

If you don't run this, you may see errors like the following: