Skip to content
Snippets Groups Projects
Commit 8c3ee2bc authored by jerryshao's avatar jerryshao Committed by Andrew Or
Browse files

[SPARK-17512][CORE] Avoid formatting to python path for yarn and mesos cluster mode

## What changes were proposed in this pull request?

Yarn and mesos cluster mode support remote python path (HDFS/S3 scheme) by their own mechanism, it is not necessary to check and format the python when running on these modes. This is a potential regression compared to 1.6, so here propose to fix it.

## How was this patch tested?

Unit test to verify SparkSubmit arguments, also with local cluster verification. Because of lack of `MiniDFSCluster` support in Spark unit test, there's no integration test added.

Author: jerryshao <sshao@hortonworks.com>

Closes #15137 from jerryshao/SPARK-17512.
parent 9fcf1c51
No related branches found
No related tags found
No related merge requests found
......@@ -311,7 +311,7 @@ object SparkSubmit {
// In Mesos cluster mode, non-local python files are automatically downloaded by Mesos.
if (args.isPython && !isYarnCluster && !isMesosCluster) {
if (Utils.nonLocalPaths(args.primaryResource).nonEmpty) {
printErrorAndExit(s"Only local python files are supported: $args.primaryResource")
printErrorAndExit(s"Only local python files are supported: ${args.primaryResource}")
}
val nonLocalPyFiles = Utils.nonLocalPaths(args.pyFiles).mkString(",")
if (nonLocalPyFiles.nonEmpty) {
......@@ -322,7 +322,7 @@ object SparkSubmit {
// Require all R files to be local
if (args.isR && !isYarnCluster) {
if (Utils.nonLocalPaths(args.primaryResource).nonEmpty) {
printErrorAndExit(s"Only local R files are supported: $args.primaryResource")
printErrorAndExit(s"Only local R files are supported: ${args.primaryResource}")
}
}
......@@ -633,7 +633,14 @@ object SparkSubmit {
// explicitly sets `spark.submit.pyFiles` in his/her default properties file.
sysProps.get("spark.submit.pyFiles").foreach { pyFiles =>
val resolvedPyFiles = Utils.resolveURIs(pyFiles)
val formattedPyFiles = PythonRunner.formatPaths(resolvedPyFiles).mkString(",")
val formattedPyFiles = if (!isYarnCluster && !isMesosCluster) {
PythonRunner.formatPaths(resolvedPyFiles).mkString(",")
} else {
// Ignoring formatting python path in yarn and mesos cluster mode, these two modes
// support dealing with remote python files, they could distribute and add python files
// locally.
resolvedPyFiles
}
sysProps("spark.submit.pyFiles") = formattedPyFiles
}
......
......@@ -582,6 +582,25 @@ class SparkSubmitSuite
val sysProps3 = SparkSubmit.prepareSubmitEnvironment(appArgs3)._3
sysProps3("spark.submit.pyFiles") should be(
PythonRunner.formatPaths(Utils.resolveURIs(pyFiles)).mkString(","))
// Test remote python files
val f4 = File.createTempFile("test-submit-remote-python-files", "", tmpDir)
val writer4 = new PrintWriter(f4)
val remotePyFiles = "hdfs:///tmp/file1.py,hdfs:///tmp/file2.py"
writer4.println("spark.submit.pyFiles " + remotePyFiles)
writer4.close()
val clArgs4 = Seq(
"--master", "yarn",
"--deploy-mode", "cluster",
"--properties-file", f4.getPath,
"hdfs:///tmp/mister.py"
)
val appArgs4 = new SparkSubmitArguments(clArgs4)
val sysProps4 = SparkSubmit.prepareSubmitEnvironment(appArgs4)._3
// Should not format python path for yarn cluster mode
sysProps4("spark.submit.pyFiles") should be(
Utils.resolveURIs(remotePyFiles)
)
}
test("user classpath first in driver") {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment