Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
spark
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
cs525-sp18-g07
spark
Commits
497f5575
Commit
497f5575
authored
11 years ago
by
Matei Zaharia
Browse files
Options
Downloads
Patches
Plain Diff
Add docs about ipython
parent
feba7ee5
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
docs/python-programming-guide.md
+31
-3
31 additions, 3 deletions
docs/python-programming-guide.md
with
31 additions
and
3 deletions
docs/python-programming-guide.md
+
31
−
3
View file @
497f5575
...
...
@@ -10,6 +10,7 @@ To learn the basics of Spark, we recommend reading through the
easy to follow even if you don't know Scala.
This guide will show how to use the Spark features described there in Python.
# Key Differences in the Python API
There are a few key differences between the Python and Scala APIs:
...
...
@@ -50,6 +51,7 @@ PySpark will automatically ship these functions to workers, along with any objec
Instances of classes will be serialized and shipped to workers by PySpark, but classes themselves cannot be automatically distributed to workers.
The
[
Standalone Use
](
#standalone-use
)
section describes how to ship code dependencies to workers.
# Installing and Configuring PySpark
PySpark requires Python 2.6 or higher.
...
...
@@ -81,16 +83,41 @@ The Python shell can be used explore data interactively and is a simple way to l
>>> help(pyspark) # Show all pyspark functions
{% endhighlight %}
By default, the
`pyspark`
shell creates SparkContext that runs jobs locally.
To connect to a non-local cluster, set the
`MASTER`
environment variable.
By default, the
`pyspark`
shell creates SparkContext that runs jobs locally
on a single core
.
To connect to a non-local cluster,
or use multiple cores,
set the
`MASTER`
environment variable.
For example, to use the
`pyspark`
shell with a
[
standalone Spark cluster
](
spark-standalone.html
)
:
{% highlight bash %}
$ MASTER=spark://IP:PORT ./pyspark
{% endhighlight %}
Or, to use four cores on the local machine:
{% highlight bash %}
$ MASTER=local[4] ./pyspark
{% endhighlight %}
## IPython
It is also possible to launch PySpark in
[
IPython
](
http://ipython.org
)
, the enhanced Python interpreter.
To do this, simply set the
`IPYTHON`
variable to
`1`
when running
`pyspark`
:
{% highlight bash %}
$ IPYTHON=1 ./pyspark
{% endhighlight %}
Alternatively, you can customize the
`ipython`
command by setting
`IPYTHON_OPTS`
. For example, to launch
the
[
IPython Notebook
](
http://ipython.org/notebook.html
)
with PyLab graphing support:
# Standalone Use
{% highlight bash %}
$ IPYTHON_OPTS="notebook --pylab inline" ./pyspark
{% endhighlight %}
IPython also works on a cluster or on multiple cores if you set the
`MASTER`
environment variable.
# Standalone Programs
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using
`pyspark`
.
The Quick Start guide includes a
[
complete example
](
quick-start.html#a-standalone-job-in-python
)
of a standalone Python job.
...
...
@@ -105,6 +132,7 @@ sc = SparkContext("local", "Job Name", pyFiles=['MyFile.py', 'lib.zip', 'app.egg
Files listed here will be added to the
`PYTHONPATH`
and shipped to remote worker machines.
Code dependencies can be added to an existing SparkContext using its
`addPyFile()`
method.
# Where to Go from Here
PySpark includes several sample programs in the
[
`python/examples` folder
](
https://github.com/mesos/spark/tree/master/python/examples
)
.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment