Skip to content
Snippets Groups Projects
  1. May 30, 2014
    • Matei Zaharia's avatar
      [SPARK-1566] consolidate programming guide, and general doc updates · c8bf4131
      Matei Zaharia authored
      This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
      
      * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
      * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
      * Spark-submit guide moved to a separate page and expanded slightly
      * Various cleanups of the menu system, security docs, and others
      * Updated look of title bar to differentiate the docs from previous Spark versions
      
      You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #896 from mateiz/1.0-docs and squashes the following commits:
      
      03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
      0779508 [Matei Zaharia] tweak
      ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
      1bf4112 [Matei Zaharia] Review comments
      4414f88 [Matei Zaharia] tweaks
      d04e979 [Matei Zaharia] Fix some old links to Java guide
      a34ed33 [Matei Zaharia] tweak
      541bb3b [Matei Zaharia] miscellaneous changes
      fcefdec [Matei Zaharia] Moved submitting apps to separate doc
      61d72b4 [Matei Zaharia] stuff
      181f217 [Matei Zaharia] migration guide, remove old language guides
      e11a0da [Matei Zaharia] Add more API functions
      6a030a9 [Matei Zaharia] tweaks
      8db0ae3 [Matei Zaharia] Added key-value pairs section
      318d2c9 [Matei Zaharia] tweaks
      1c81477 [Matei Zaharia] New section on basics and function syntax
      e38f559 [Matei Zaharia] Actually added programming guide to Git
      a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
      3b6a876 [Matei Zaharia] More CSS tweaks
      01ec8bf [Matei Zaharia] More CSS tweaks
      e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
      c8bf4131
  2. May 17, 2014
    • Andrew Or's avatar
      [SPARK-1824] Remove <master> from Python examples · cf6cbe9f
      Andrew Or authored
      A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too.
      
      Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example,
      
      ```
      bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512]
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #802 from andrewor14/python-examples and squashes the following commits:
      
      cf50b9f [Andrew Or] De-indent python comments (minor)
      50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction
      c362f69 [Andrew Or] Update docs to use spark-submit for python applications
      7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples
      427a5f0 [Andrew Or] Update docs
      d32072c [Andrew Or] Remove <master> from examples + update usages
      cf6cbe9f
  3. May 14, 2014
    • Matei Zaharia's avatar
      Add language tabs and Python version to interactive part of quick-start · f10de042
      Matei Zaharia authored
      This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567. I've also updated the doc to show submitting the Python application with spark-submit.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #782 from mateiz/spark-1567-extra and squashes the following commits:
      
      6f8f2aa [Matei Zaharia] tweaks
      9ed9874 [Matei Zaharia] tweaks
      ae67c3e [Matei Zaharia] tweak
      b303ba3 [Matei Zaharia] tweak
      1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start guide
      f10de042
  4. May 12, 2014
    • Andrew Or's avatar
      [SPARK-1753 / 1773 / 1814] Update outdated docs for spark-submit, YARN, standalone etc. · 2ffd1eaf
      Andrew Or authored
      YARN
      - SparkPi was updated to not take in master as an argument; we should update the docs to reflect that.
      - The default YARN build guide should be in maven, not sbt.
      - This PR also adds a paragraph on steps to debug a YARN application.
      
      Standalone
      - Emphasize spark-submit more. Right now it's one small paragraph preceding the legacy way of launching through `org.apache.spark.deploy.Client`.
      - The way we set configurations / environment variables according to the old docs is outdated. This needs to reflect changes introduced by the Spark configuration changes we made.
      
      In general, this PR also adds a little more documentation on the new spark-shell, spark-submit, spark-defaults.conf etc here and there.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #701 from andrewor14/yarn-docs and squashes the following commits:
      
      e2c2312 [Andrew Or] Merge in changes in #752 (SPARK-1814)
      25cfe7b [Andrew Or] Merge in the warning from SPARK-1753
      a8c39c5 [Andrew Or] Minor changes
      336bbd9 [Andrew Or] Tabs -> spaces
      4d9d8f7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      041017a [Andrew Or] Abstract Spark submit documentation to cluster-overview.html
      3cc0649 [Andrew Or] Detail how to set configurations + remove legacy instructions
      5b7140a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      85a51fc [Andrew Or] Update run-example, spark-shell, configuration etc.
      c10e8c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      381fe32 [Andrew Or] Update docs for standalone mode
      757c184 [Andrew Or] Add a note about the requirements for the debugging trick
      f8ca990 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      924f04c [Andrew Or] Revert addition of --deploy-mode
      d5fe17b [Andrew Or] Update the YARN docs
      2ffd1eaf
  5. May 10, 2014
    • Andy Konwinski's avatar
      fix broken in link in python docs · c05d11bb
      Andy Konwinski authored
      Author: Andy Konwinski <andykonwinski@gmail.com>
      
      Closes #650 from andyk/python-docs-link-fix and squashes the following commits:
      
      a1f9d51 [Andy Konwinski] fix broken in link in python docs
      c05d11bb
  6. May 06, 2014
    • Sandeep's avatar
      SPARK-1637: Clean up examples for 1.0 · a000b5c3
      Sandeep authored
      - [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
      - [x] Move Python examples into examples/src/main/python
      - [x] Update docs to reflect these changes
      
      Author: Sandeep <sandeep@techaddict.me>
      
      This patch had conflicts when merged, resolved by
      Committer: Matei Zaharia <matei@databricks.com>
      
      Closes #571 from techaddict/SPARK-1637 and squashes the following commits:
      
      47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
      8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
      5f96121 [Sandeep] Move Python examples into examples/src/main/python
      0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
      a000b5c3
    • Matei Zaharia's avatar
      [SPARK-1549] Add Python support to spark-submit · 951a5d93
      Matei Zaharia authored
      This PR updates spark-submit to allow submitting Python scripts (currently only with deploy-mode=client, but that's all that was supported before) and updates the PySpark code to properly find various paths, etc. One significant change is that we assume we can always find the Python files either from the Spark assembly JAR (which will happen with the Maven assembly build in make-distribution.sh) or from SPARK_HOME (which will exist in local mode even if you use sbt assembly, and should be enough for testing). This means we no longer need a weird hack to modify the environment for YARN.
      
      This patch also updates the Python worker manager to run python with -u, which means unbuffered output (send it to our logs right away instead of waiting a while after stuff was written); this should simplify debugging.
      
      In addition, it fixes https://issues.apache.org/jira/browse/SPARK-1709, setting the main class from a JAR's Main-Class attribute if not specified by the user, and fixes a few help strings and style issues in spark-submit.
      
      In the future we may want to make the `pyspark` shell use spark-submit as well, but it seems unnecessary for 1.0.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #664 from mateiz/py-submit and squashes the following commits:
      
      15e9669 [Matei Zaharia] Fix some uses of path.separator property
      051278c [Matei Zaharia] Small style fixes
      0afe886 [Matei Zaharia] Add license headers
      4650412 [Matei Zaharia] Add pyFiles to PYTHONPATH in executors, remove old YARN stuff, add tests
      15f8e1e [Matei Zaharia] Set PYTHONPATH in PythonWorkerFactory in case it wasn't set from outside
      47c0655 [Matei Zaharia] More work to make spark-submit work with Python:
      d4375bd [Matei Zaharia] Clean up description of spark-submit args a bit and add Python ones
      951a5d93
  7. Apr 30, 2014
    • Sandy Ryza's avatar
      SPARK-1004. PySpark on YARN · ff5be9a4
      Sandy Ryza authored
      This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #30 from sryza/sandy-spark-1004 and squashes the following commits:
      
      89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time
      5165a02 [Sandy Ryza] Fix docs
      fd0df79 [Sandy Ryza] PySpark on YARN
      ff5be9a4
  8. Apr 21, 2014
    • Matei Zaharia's avatar
      [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs · fc783847
      Matei Zaharia authored
      I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.
      
      Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #457 from mateiz/better-docs and squashes the following commits:
      
      a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
      5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
      f05abc0 [Matei Zaharia] Don't include java.lang package names
      995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
      a14a93c [Matei Zaharia] typo
      76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
      ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
      acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
      fc783847
  9. Apr 15, 2014
  10. Apr 07, 2014
    • Aaron Davidson's avatar
      SPARK-1099: Introduce local[*] mode to infer number of cores · 0307db0f
      Aaron Davidson authored
      This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #182 from aarondav/110 and squashes the following commits:
      
      a88294c [Aaron Davidson] Rebased changes for new spark-shell
      a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
      0307db0f
  11. Apr 05, 2014
    • Matei Zaharia's avatar
      SPARK-1421. Make MLlib work on Python 2.6 · 0b855167
      Matei Zaharia authored
      The reason it wasn't working was passing a bytearray to stream.write(), which is not supported in Python 2.6 but is in 2.7. (This array came from NumPy when we converted data to send it over to Java). Now we just convert those bytearrays to strings of bytes, which preserves nonprintable characters as well.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #335 from mateiz/mllib-python-2.6 and squashes the following commits:
      
      f26c59f [Matei Zaharia] Update docs to no longer say we need Python 2.7
      a84d6af [Matei Zaharia] SPARK-1421. Make MLlib work on Python 2.6
      0b855167
  12. Mar 13, 2014
    • Sandy Ryza's avatar
      SPARK-1183. Don't use "worker" to mean executor · 69837321
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #120 from sryza/sandy-spark-1183 and squashes the following commits:
      
      5066a4a [Sandy Ryza] Remove "worker" in a couple comments
      0bd1e46 [Sandy Ryza] Remove --am-class from usage
      bfc8fe0 [Sandy Ryza] Remove am-class from doc and fix yarn-alpha
      607539f [Sandy Ryza] Address review comments
      74d087a [Sandy Ryza] SPARK-1183. Don't use "worker" to mean executor
      69837321
  13. Feb 26, 2014
    • Jyotiska NK's avatar
      Updated link for pyspark examples in docs · 26450351
      Jyotiska NK authored
      Author: Jyotiska NK <jyotiska123@gmail.com>
      
      Closes #22 from jyotiska/pyspark_docs and squashes the following commits:
      
      426136c [Jyotiska NK] Updated link for pyspark examples
      26450351
  14. Jan 15, 2014
  15. Jan 12, 2014
  16. Jan 07, 2014
    • Patrick Wendell's avatar
      Simplify and fix pyspark script. · 82a1d38a
      Patrick Wendell authored
      This patch removes compatibility for IPython < 1.0 but fixes the launch
      script and makes it much simpler.
      
      I tested this using the three commands in the PySpark documentation page:
      
      1. IPYTHON=1 ./pyspark
      2. IPYTHON_OPTS="notebook" ./pyspark
      3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark
      
      There are two changes:
      - We rely on PYTHONSTARTUP env var to start PySpark
      - Removed the quotes around $IPYTHON_OPTS... having quotes
        gloms them together as a single argument passed to `exec` which
        seemed to cause ipython to fail (it instead expects them as
        multiple arguments).
      82a1d38a
  17. Jan 06, 2014
  18. Jan 02, 2014
  19. Dec 30, 2013
  20. Oct 22, 2013
  21. Oct 09, 2013
  22. Sep 10, 2013
  23. Sep 08, 2013
  24. Sep 02, 2013
  25. Sep 01, 2013
  26. Aug 31, 2013
  27. Aug 29, 2013
  28. Jul 29, 2013
  29. Jul 01, 2013
  30. Jun 26, 2013
  31. Feb 25, 2013
  32. Feb 18, 2013
  33. Jan 30, 2013
  34. Jan 20, 2013
  35. Jan 08, 2013
  36. Jan 01, 2013
Loading