Commits · cb0eae3b78d7f6f56c0b9521ee48564a4967d3de · cs525-sp18-g07 / spark

Sep 06, 2014

[SPARK-3273][SPARK-3301]We should read the version information from the same place · 21a1e1bb

GuoQiang Li authored 10 years ago

Author: GuoQiang Li <witgo@qq.com>

Closes #2175 from witgo/SPARK-3273 and squashes the following commits:

cf9c65a [GuoQiang Li] We should read the version information from the same place
2a44e2f [GuoQiang Li] The spark version in the welcome message of pyspark is not correct

21a1e1bb

Sep 03, 2014

[SPARK-2435] Add shutdown hook to pyspark · 7c6e71f0

Matthew Farrellee authored 10 years ago

Author: Matthew Farrellee <matt@redhat.com>

Closes #2183 from mattf/SPARK-2435 and squashes the following commits:

ee0ee99 [Matthew Farrellee] [SPARK-2435] Add shutdown hook to pyspark

7c6e71f0

Jul 22, 2014

[SPARK-2470] PEP8 fixes to PySpark · 5d16d5bb

Nicholas Chammas authored 10 years ago

This pull request aims to resolve all outstanding PEP8 violations in PySpark.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>

Closes #1505 from nchammas/master and squashes the following commits:

98171af [Nicholas Chammas] [SPARK-2470] revert PEP 8 fixes to cloudpickle
cba7768 [Nicholas Chammas] [SPARK-2470] wrap expression list in parentheses
e178dbe [Nicholas Chammas] [SPARK-2470] style - change position of line break
9127d2b [Nicholas Chammas] [SPARK-2470] wrap expression lists in parentheses
22132a4 [Nicholas Chammas] [SPARK-2470] wrap conditionals in parentheses
24639bc [Nicholas Chammas] [SPARK-2470] fix whitespace for doctest
7d557b7 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to tests.py
8f8e4c0 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to storagelevel.py
b3b96cf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to statcounter.py
d644477 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to worker.py
aa3a7b6 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to sql.py
1916859 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to shell.py
95d1d95 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to serializers.py
a0fec2e [Nicholas Chammas] [SPARK-2470] PEP8 fixes to mllib
c85e1e5 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to join.py
d14f2f1 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to __init__.py
81fcb20 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to resultiterable.py
1bde265 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to java_gateway.py
7fc849c [Nicholas Chammas] [SPARK-2470] PEP8 fixes to daemon.py
ca2d28b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to context.py
f4e0039 [Nicholas Chammas] [SPARK-2470] PEP8 fixes to conf.py
a6d5e4b [Nicholas Chammas] [SPARK-2470] PEP8 fixes to cloudpickle.py
f0a7ebf [Nicholas Chammas] [SPARK-2470] PEP8 fixes to rddsampler.py
4dd148f [nchammas] Merge pull request #5 from apache/master
f7e4581 [Nicholas Chammas] unrelated pep8 fix
a36eed0 [Nicholas Chammas] name ec2 instances and security groups consistently
de7292a [nchammas] Merge pull request #4 from apache/master
2e4fe00 [nchammas] Merge pull request #3 from apache/master
89fde08 [nchammas] Merge pull request #2 from apache/master
69f6e22 [Nicholas Chammas] PEP8 fixes
2627247 [Nicholas Chammas] broke up lines before they hit 100 chars
6544b7e [Nicholas Chammas] [SPARK-2065] give launched instances names
69da6cf [nchammas] Merge pull request #1 from apache/master

5d16d5bb

Jul 15, 2014

follow pep8 None should be compared using is or is not · 563acf5e

Ken Takagiwa authored 11 years ago

http://legacy.python.org/dev/peps/pep-0008/
## Programming Recommendations
- Comparisons to singletons like None should always be done with is or is not, never the equality operators.

Author: Ken Takagiwa <ken@Kens-MacBook-Pro.local>

Closes #1422 from giwa/apache_master and squashes the following commits:

7b361f3 [Ken Takagiwa] follow pep8 None should be checked using is or is not

563acf5e

May 17, 2014

[SPARK-1808] Route bin/pyspark through Spark submit · 4b8ec6fc

Andrew Or authored 11 years ago

**Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`.

**Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.

**Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.

For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.

This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.

Author: Andrew Or <andrewor14@gmail.com>

Closes #799 from andrewor14/pyspark-submit and squashes the following commits:

bf37e36 [Andrew Or] Minor changes
01066fa [Andrew Or] bin/pyspark for Windows
c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
1866f85 [Andrew Or] Windows is not cooperating
456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
b7ba0d8 [Andrew Or] Address a few comments (minor)
06eb138 [Andrew Or] Use shlex instead of writing our own parser
05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
6fba412 [Andrew Or] Deal with quotes + address various comments
fe4c8a7 [Andrew Or] Update --help for bin/pyspark
afe47bf [Andrew Or] Fix spark shell
f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
a371d26 [Andrew Or] Route bin/pyspark through Spark submit

4b8ec6fc

Apr 18, 2014

Fixed broken pyspark shell. · 81a152c5

Reynold Xin authored 11 years ago

Author: Reynold Xin <rxin@apache.org>

Closes #444 from rxin/pyspark and squashes the following commits:

fc11356 [Reynold Xin] Made the PySpark shell version checking compatible with Python 2.6.
571830b [Reynold Xin] Fixed broken pyspark shell.

81a152c5

Apr 16, 2014

[python alternative] pyspark require Python2, failing if system default is Py3 from shell.py · bb76eae1

AbhishekKr authored 11 years ago

Python alternative for https://github.com/apache/spark/pull/392; managed from shell.py

Author: AbhishekKr <abhikumar163@gmail.com>

Closes #399 from abhishekkr/pyspark_shell and squashes the following commits:

134bdc9 [AbhishekKr] pyspark require Python2, failing if system default is Py3 from shell.py

bb76eae1

Apr 10, 2014

Set spark.executor.uri from environment variable (needed by Mesos) · 5cd11d51

Ivan Wick authored 11 years ago

The Mesos backend uses this property when setting up a slave process. It is similarly set in the Scala repl (org.apache.spark.repl.SparkILoop), but I couldn't find any analogous for pyspark.

Author: Ivan Wick <ivanwick+github@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #311 from ivanwick/master and squashes the following commits:

da0c3e4 [Ivan Wick] Set spark.executor.uri from environment variable (needed by Mesos)

5cd11d51

Apr 07, 2014

SPARK-1099: Introduce local[*] mode to infer number of cores · 0307db0f

Aaron Davidson authored 11 years ago

This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core.

Author: Aaron Davidson <aaron@databricks.com>

Closes #182 from aarondav/110 and squashes the following commits:

a88294c [Aaron Davidson] Rebased changes for new spark-shell
a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores

0307db0f

Feb 08, 2014

Merge pull request #542 from markhamstra/versionBump. Closes #542. · c2341c92

Mark Hamstra authored 11 years ago

Version number to 1.0.0-SNAPSHOT

Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.

@pwendell

Author: Mark Hamstra <markhamstra@gmail.com>

== Merge branch commits ==

commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <markhamstra@gmail.com>
Date:   Wed Feb 5 09:30:32 2014 -0800

    Version number to 1.0.0-SNAPSHOT

c2341c92

Jan 02, 2014
- pyspark -> bin/pyspark · a3f90a2e
  Prashant Sharma authored 11 years ago
  
  a3f90a2e
Dec 24, 2013
- Typo: avaiable -> available · 3665c722
  Andrew Ash authored 11 years ago
  
  3665c722
Sep 24, 2013
- Update build version in master · 6079721f
  Patrick Wendell authored 11 years ago
  
  6079721f
Sep 07, 2013

Export StorageLevel and refactor · c1cc8c4d
Aaron Davidson authored 11 years ago

c1cc8c4d

Remove reflection, hard-code StorageLevels · 8001687a

Aaron Davidson authored 11 years ago

The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise
the shell would have to call a private method of SparkContext. Having
StorageLevel available in sc also doesn't seem like the end of the world.
There may be a better solution, though.

As for creating the StorageLevel object itself, this seems to be the best
way in Python 2 for creating singleton, enum-like objects:
http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python

8001687a

Sep 06, 2013

SPARK-660: Add StorageLevel support in Python · a63d4c7d

Aaron Davidson authored 11 years ago

It uses reflection... I am not proud of that fact, but it at least ensures
compatibility (sans refactoring of the StorageLevel stuff).

a63d4c7d

Sep 01, 2013
- Add banner to PySpark and make wordcount output nicer · bbaa9d7d
  Matei Zaharia authored 11 years ago
  
  bbaa9d7d
Aug 12, 2013
- Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark · 8fd5c7bc
  Andre Schumacher authored 11 years ago
  
  Now ADD_FILES uses a comma as file name separator.
  8fd5c7bc
Jul 16, 2013
- Add Apache license headers and LICENSE and NOTICE files · af3c9d50
  Matei Zaharia authored 12 years ago
  
  af3c9d50
Jan 30, 2013
- Make module help available in python shell. · 3f945e3b
  Patrick Wendell authored 12 years ago
  
  Also, adds a line in doc explaining how to use.
  3f945e3b
Jan 20, 2013
- Added accumulators to PySpark · 8e7f098a
  Matei Zaharia authored 12 years ago
  
  8e7f098a
Jan 01, 2013
- Add `pyspark` script to replace the other scripts. · ce9f1bbe
  Josh Rosen authored 12 years ago
  
  Expand the PySpark programming guide.
  ce9f1bbe
- Rename top-level 'pyspark' directory to 'python' · b58340db
  Josh Rosen authored 12 years ago
  
  b58340db
Dec 28, 2012

Simplify PySpark installation. · 665466df

Josh Rosen authored 12 years ago

- Bundle Py4J binaries, since it's hard to install
- Uses Spark's `run` script to launch the Py4J
  gateway, inheriting the settings in spark-env.sh

With these changes, (hopefully) nothing more than
running `sbt/sbt package` will be necessary to run
PySpark.

665466df

Dec 27, 2012
- Add IPython support to pyspark-shell. · 2d98fff0
  Josh Rosen authored 12 years ago
  
  Suggested by / based on code from @MLnick
  2d98fff0
Oct 19, 2012
- Add PySpark README and run scripts. · c23bf1af
  Josh Rosen authored 12 years ago
  
  c23bf1af