Commits · 52834d761b059264214dfc6a1f9c70b8bc7ec089 · cs525-sp18-g07 / spark

Mar 09, 2014

SPARK-929: Fully deprecate usage of SPARK_MEM · 52834d76

Aaron Davidson authored 11 years ago

(Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)

This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY

The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.

SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.

SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.

Other memory considerations:
- The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
- run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).

This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.

Author: Aaron Davidson <aaron@databricks.com>

Closes #99 from aarondav/sparkmem and squashes the following commits:

9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM

52834d76

Jan 03, 2014
- sbin/spark-class* -> bin/spark-class* · 74ba97fc
  Prashant Sharma authored 11 years ago
  
  74ba97fc
Dec 29, 2013
- Add SparkConf support in Python · cd00225d
  Matei Zaharia authored 11 years ago
  
  cd00225d
Dec 24, 2013
- Python change for move of PythonMLLibAPI. · 4efec6eb
  Tor Myklebust authored 11 years ago
  
  4efec6eb
Dec 19, 2013
- The rest of the Python side of those bindings. · bf491bb3
  Tor Myklebust authored 11 years ago
  
  bf491bb3
Sep 26, 2013

fix paths and change spark to use APP_MEM as application driver memory instead... · e8b1ee04

shane-huang authored 11 years ago

fix paths and change spark to use APP_MEM as application driver memory instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH

Signed-off-by: shane-huang <shengsheng.huang@intel.com>

e8b1ee04

Sep 22, 2013
- added spark-class and spark-executor to sbin · dfbdc9dd
  shane-huang authored 11 years ago
  
  Signed-off-by: shane-huang <shengsheng.huang@intel.com>
  dfbdc9dd
Sep 01, 2013
- Further fixes to get PySpark to work on Windows · 141f5427
  Matei Zaharia authored 11 years ago
  
  141f5427
- Initial work to rename package to org.apache.spark · 46eecd11
  Matei Zaharia authored 11 years ago
  
  46eecd11
Aug 29, 2013

Change build and run instructions to use assemblies · 53cd50c0

Matei Zaharia authored 11 years ago

This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.

53cd50c0

Aug 28, 2013

Don't send SIGINT to Py4J gateway subprocess. · 742c44ea

Josh Rosen authored 11 years ago

This addresses SPARK-885, a usability issue where PySpark's
Java gateway process would be killed if the user hit ctrl-c.

Note that SIGINT still won't cancel the running s

This fix is based on http://stackoverflow.com/questions/5045771

742c44ea

Jul 16, 2013
- Add Apache license headers and LICENSE and NOTICE files · af3c9d50
  Matei Zaharia authored 11 years ago
  
  af3c9d50
Jan 01, 2013
- Rename top-level 'pyspark' directory to 'python' · b58340db
  Josh Rosen authored 12 years ago
  
  b58340db
Dec 29, 2012
- Fix bug (introduced by batching) in PySpark take() · 7ec3595d
  Josh Rosen authored 12 years ago
  
  7ec3595d
Dec 28, 2012

Mark api.python classes as private; echo Java output to stderr. · fbadb1cd
Josh Rosen authored 12 years ago

fbadb1cd

Simplify PySpark installation. · 665466df

Josh Rosen authored 12 years ago

- Bundle Py4J binaries, since it's hard to install
- Uses Spark's `run` script to launch the Py4J
  gateway, inheriting the settings in spark-env.sh

With these changes, (hopefully) nothing more than
running `sbt/sbt package` will be necessary to run
PySpark.

665466df

Oct 19, 2012
- Update Python API for v0.6.0 compatibility. · 52989c8a
  Josh Rosen authored 12 years ago
  
  52989c8a
Aug 21, 2012

Use only cPickle for serialization in Python API. · fd94e544

Josh Rosen authored 12 years ago

Objects serialized with JSON can be compared for equality, but JSON can be slow
to serialize and only supports a limited range of data types.

fd94e544

Aug 19, 2012
- Add Python API. · 886b39de
  Josh Rosen authored 12 years ago
  
  886b39de