Skip to content
Snippets Groups Projects
Commit b5df1cd6 authored by Matei Zaharia's avatar Matei Zaharia
Browse files

ADD_JARS environment variable for spark-shell

parent 3e61beff
No related branches found
No related tags found
No related merge requests found
...@@ -43,12 +43,18 @@ new SparkContext(master, appName, [sparkHome], [jars]) ...@@ -43,12 +43,18 @@ new SparkContext(master, appName, [sparkHome], [jars])
The `master` parameter is a string specifying a [Spark or Mesos cluster URL](#master-urls) to connect to, or a special "local" string to run in local mode, as described below. `appName` is a name for your application, which will be shown in the cluster web UI. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later. The `master` parameter is a string specifying a [Spark or Mesos cluster URL](#master-urls) to connect to, or a special "local" string to run in local mode, as described below. `appName` is a name for your application, which will be shown in the cluster web UI. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.
In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable. For example, to run on four cores, use In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable, and you can add JARs to the classpath with the `ADD_JARS` variable. For example, to run `spark-shell` on four cores, use
{% highlight bash %} {% highlight bash %}
$ MASTER=local[4] ./spark-shell $ MASTER=local[4] ./spark-shell
{% endhighlight %} {% endhighlight %}
Or, to also add `code.jar` to its classpath, use:
{% highlight bash %}
$ MASTER=local[4] ADD_JARS=code.jar ./spark-shell
{% endhighlight %}
### Master URLs ### Master URLs
The master URL passed to Spark can be in one of the following formats: The master URL passed to Spark can be in one of the following formats:
...@@ -78,7 +84,7 @@ If you want to run your job on a cluster, you will need to specify the two optio ...@@ -78,7 +84,7 @@ If you want to run your job on a cluster, you will need to specify the two optio
* `sparkHome`: The path at which Spark is installed on your worker machines (it should be the same on all of them). * `sparkHome`: The path at which Spark is installed on your worker machines (it should be the same on all of them).
* `jars`: A list of JAR files on the local machine containing your job's code and any dependencies, which Spark will deploy to all the worker nodes. You'll need to package your job into a set of JARs using your build system. For example, if you're using SBT, the [sbt-assembly](https://github.com/sbt/sbt-assembly) plugin is a good way to make a single JAR with your code and dependencies. * `jars`: A list of JAR files on the local machine containing your job's code and any dependencies, which Spark will deploy to all the worker nodes. You'll need to package your job into a set of JARs using your build system. For example, if you're using SBT, the [sbt-assembly](https://github.com/sbt/sbt-assembly) plugin is a good way to make a single JAR with your code and dependencies.
If you run `spark-shell` on a cluster, any classes you define in the shell will automatically be distributed. If you run `spark-shell` on a cluster, you can add JARs to it by specifying the `ADD_JARS` environment variable before you launch it. This variable should contain a comma-separated list of JARs. For example, `ADD_JARS=a.jar,b.jar ./spark-shell` will launch a shell with `a.jar` and `b.jar` on its classpath. In addition, any new classes you define in the shell will automatically be distributed.
# Resilient Distributed Datasets (RDDs) # Resilient Distributed Datasets (RDDs)
......
...@@ -822,7 +822,7 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master: ...@@ -822,7 +822,7 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master:
spark.repl.Main.interp.out.println("Spark context available as sc."); spark.repl.Main.interp.out.println("Spark context available as sc.");
spark.repl.Main.interp.out.flush(); spark.repl.Main.interp.out.flush();
""") """)
command("import spark.SparkContext._"); command("import spark.SparkContext._")
} }
echo("Type in expressions to have them evaluated.") echo("Type in expressions to have them evaluated.")
echo("Type :help for more information.") echo("Type :help for more information.")
...@@ -838,7 +838,8 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master: ...@@ -838,7 +838,8 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master:
if (prop != null) prop else "local" if (prop != null) prop else "local"
} }
} }
sparkContext = new SparkContext(master, "Spark shell") val jars = Option(System.getenv("ADD_JARS")).map(_.split(',')).getOrElse(new Array[String](0))
sparkContext = new SparkContext(master, "Spark shell", System.getenv("SPARK_HOME"), jars)
sparkContext sparkContext
} }
...@@ -850,6 +851,10 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master: ...@@ -850,6 +851,10 @@ class SparkILoop(in0: Option[BufferedReader], val out: PrintWriter, val master:
printWelcome() printWelcome()
echo("Initializing interpreter...") echo("Initializing interpreter...")
// Add JARS specified in Spark's ADD_JARS variable to classpath
val jars = Option(System.getenv("ADD_JARS")).map(_.split(',')).getOrElse(new Array[String](0))
jars.foreach(settings.classpath.append(_))
this.settings = settings this.settings = settings
createInterpreter() createInterpreter()
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment