Skip to content
Snippets Groups Projects
Commit 55327a28 authored by Matei Zaharia's avatar Matei Zaharia
Browse files

Merge pull request #430 from pwendell/pyspark-guide

Minor improvements to PySpark docs
parents d12330bd 3f945e3b
No related branches found
No related tags found
No related merge requests found
...@@ -67,13 +67,20 @@ The script automatically adds the `pyspark` package to the `PYTHONPATH`. ...@@ -67,13 +67,20 @@ The script automatically adds the `pyspark` package to the `PYTHONPATH`.
# Interactive Use # Interactive Use
The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs. The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options:
When run without any input files, `pyspark` launches a shell that can be used explore data interactively, which is a simple way to learn the API:
{% highlight bash %}
$ sbt/sbt package
$ ./pyspark
{% endhighlight %}
The Python shell can be used explore data interactively and is a simple way to learn the API:
{% highlight python %} {% highlight python %}
>>> words = sc.textFile("/usr/share/dict/words") >>> words = sc.textFile("/usr/share/dict/words")
>>> words.filter(lambda w: w.startswith("spar")).take(5) >>> words.filter(lambda w: w.startswith("spar")).take(5)
[u'spar', u'sparable', u'sparada', u'sparadrap', u'sparagrass'] [u'spar', u'sparable', u'sparada', u'sparadrap', u'sparagrass']
>>> help(pyspark) # Show all pyspark functions
{% endhighlight %} {% endhighlight %}
By default, the `pyspark` shell creates SparkContext that runs jobs locally. By default, the `pyspark` shell creates SparkContext that runs jobs locally.
......
...@@ -4,6 +4,7 @@ An interactive shell. ...@@ -4,6 +4,7 @@ An interactive shell.
This file is designed to be launched as a PYTHONSTARTUP script. This file is designed to be launched as a PYTHONSTARTUP script.
""" """
import os import os
import pyspark
from pyspark.context import SparkContext from pyspark.context import SparkContext
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment