Skip to content
Snippets Groups Projects
Commit d6d11c2e authored by Matei Zaharia's avatar Matei Zaharia
Browse files

Merge pull request #129 from velvia/2013-11/document-local-uris

Document & finish support for local: URIs

Review all the supported URI schemes for addJar / addFile to the Cluster Overview page.
Add support for local: URI to addFile.
parents 8f1098a3 f3679fd4
No related branches found
No related tags found
No related merge requests found
...@@ -591,7 +591,8 @@ class SparkContext( ...@@ -591,7 +591,8 @@ class SparkContext(
val uri = new URI(path) val uri = new URI(path)
val key = uri.getScheme match { val key = uri.getScheme match {
case null | "file" => env.httpFileServer.addFile(new File(uri.getPath)) case null | "file" => env.httpFileServer.addFile(new File(uri.getPath))
case _ => path case "local" => "file:" + uri.getPath
case _ => path
} }
addedFiles(key) = System.currentTimeMillis addedFiles(key) = System.currentTimeMillis
......
...@@ -13,7 +13,7 @@ object in your main program (called the _driver program_). ...@@ -13,7 +13,7 @@ object in your main program (called the _driver program_).
Specifically, to run on a cluster, the SparkContext can connect to several types of _cluster managers_ Specifically, to run on a cluster, the SparkContext can connect to several types of _cluster managers_
(either Spark's own standalone cluster manager or Mesos/YARN), which allocate resources across (either Spark's own standalone cluster manager or Mesos/YARN), which allocate resources across
applications. Once connected, Spark acquires *executors* on nodes in the cluster, which are applications. Once connected, Spark acquires *executors* on nodes in the cluster, which are
worker processes that run computations and store data for your application. worker processes that run computations and store data for your application.
Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to
the executors. Finally, SparkContext sends *tasks* for the executors to run. the executors. Finally, SparkContext sends *tasks* for the executors to run.
...@@ -57,6 +57,18 @@ which takes a list of JAR files (Java/Scala) or .egg and .zip libraries (Python) ...@@ -57,6 +57,18 @@ which takes a list of JAR files (Java/Scala) or .egg and .zip libraries (Python)
worker nodes. You can also dynamically add new files to be sent to executors with `SparkContext.addJar` worker nodes. You can also dynamically add new files to be sent to executors with `SparkContext.addJar`
and `addFile`. and `addFile`.
## URIs for addJar / addFile
- **file:** - Absolute paths and `file:/` URIs are served by the driver's HTTP file server, and every executor
pulls the file from the driver HTTP server
- **hdfs:**, **http:**, **https:**, **ftp:** - these pull down files and JARs from the URI as expected
- **local:** - a URI starting with local:/ is expected to exist as a local file on each worker node. This
means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker,
or shared via NFS, GlusterFS, etc.
Note that JARs and files are copied to the working directory for each SparkContext on the executor nodes.
Over time this can use up a significant amount of space and will need to be cleaned up.
# Monitoring # Monitoring
Each driver program has a web UI, typically on port 4040, that displays information about running Each driver program has a web UI, typically on port 4040, that displays information about running
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment