Skip to content
Snippets Groups Projects
Commit 2462dbcc authored by Sun Rui's avatar Sun Rui Committed by Shivaram Venkataraman
Browse files

[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.

Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes.

The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395).

BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host.

For your information, PYSPARK has two environment variables serving simliar purpose:
PYSPARK_PYTHON	      Python binary executable to use for PySpark in both driver and workers (default is `python`).
PYSPARK_DRIVER_PYTHON	Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).
pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script.

Author: Sun Rui <rui.sun@intel.com>

Closes #9179 from sun-rui/SPARK-10971.
parent 4725cb98
No related branches found
No related tags found
No related merge requests found
......@@ -40,7 +40,16 @@ object RRunner {
// Time to wait for SparkR backend to initialize in seconds
val backendTimeout = sys.env.getOrElse("SPARKR_BACKEND_TIMEOUT", "120").toInt
val rCommand = "Rscript"
val rCommand = {
// "spark.sparkr.r.command" is deprecated and replaced by "spark.r.command",
// but kept here for backward compatibility.
var cmd = sys.props.getOrElse("spark.sparkr.r.command", "Rscript")
cmd = sys.props.getOrElse("spark.r.command", cmd)
if (sys.props.getOrElse("spark.submit.deployMode", "client") == "client") {
cmd = sys.props.getOrElse("spark.r.driver.command", cmd)
}
cmd
}
// Check if the file path exists.
// If not, change directory to current working directory for YARN cluster mode
......
......@@ -1589,6 +1589,20 @@ Apart from these, the following properties are also available, and may be useful
Number of threads used by RBackend to handle RPC calls from SparkR package.
</td>
</tr>
<tr>
<td><code>spark.r.command</code></td>
<td>Rscript</td>
<td>
Executable for executing R scripts in cluster modes for both driver and workers.
</td>
</tr>
<tr>
<td><code>spark.r.driver.command</code></td>
<td>spark.r.command</td>
<td>
Executable for executing R scripts in client modes for driver. Ignored in cluster modes.
</td>
</tr>
</table>
#### Cluster Managers
......@@ -1628,6 +1642,10 @@ The following variables can be set in `spark-env.sh`:
<td><code>PYSPARK_DRIVER_PYTHON</code></td>
<td>Python binary executable to use for PySpark in driver only (default is <code>PYSPARK_PYTHON</code>).</td>
</tr>
<tr>
<td><code>SPARKR_DRIVER_R</code></td>
<td>R binary executable to use for SparkR shell (default is <code>R</code>).</td>
</tr>
<tr>
<td><code>SPARK_LOCAL_IP</code></td>
<td>IP address of the machine to bind to.</td>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment