Skip to content
Snippets Groups Projects
  1. Jul 26, 2017
    • hyukjinkwon's avatar
      [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions · 60472dbf
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This generates a documentation for Spark SQL built-in functions.
      
      One drawback is, this requires a proper build to generate built-in function list.
      Once it is built, it only takes few seconds by `sql/create-docs.sh`.
      
      Please see https://spark-test.github.io/sparksqldoc/ that I hosted to show the output documentation.
      
      There are few more works to be done in order to make the documentation pretty, for example, separating `Arguments:` and `Examples:` but I guess this should be done within `ExpressionDescription` and `ExpressionInfo` rather than manually parsing it. I will fix these in a follow up.
      
      This requires `pip install mkdocs` to generate HTMLs from markdown files.
      
      ## How was this patch tested?
      
      Manually tested:
      
      ```
      cd docs
      jekyll build
      ```
      ,
      
      ```
      cd docs
      jekyll serve
      ```
      
      and
      
      ```
      cd sql
      create-docs.sh
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18702 from HyukjinKwon/SPARK-21485.
      60472dbf
  2. Jul 15, 2017
  3. Jul 13, 2017
    • Sean Owen's avatar
      [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 · 425c4ada
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove Scala 2.10 build profiles and support
      - Replace some 2.10 support in scripts with commented placeholders for 2.12 later
      - Remove deprecated API calls from 2.10 support
      - Remove usages of deprecated context bounds where possible
      - Remove Scala 2.10 workarounds like ScalaReflectionLock
      - Other minor Scala warning fixes
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17150 from srowen/SPARK-19810.
      425c4ada
  4. Jul 06, 2017
    • Tathagata Das's avatar
      [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation · 0217dfd2
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      Few changes to the Structured Streaming documentation
      - Clarify that the entire stream input table is not materialized
      - Add information for Ganglia
      - Add Kafka Sink to the main docs
      - Removed a couple of leftover experimental tags
      - Added more associated reading material and talk videos.
      
      In addition, https://github.com/apache/spark/pull/16856 broke the link to the RDD programming guide in several places while renaming the page. This PR fixes those sameeragarwal cloud-fan.
      - Added a redirection to avoid breaking internal and possible external links.
      - Removed unnecessary redirection pages that were there since the separate scala, java, and python programming guides were merged together in 2013 or 2014.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #18485 from tdas/SPARK-21267.
      0217dfd2
  5. Jun 07, 2017
  6. May 07, 2017
    • Steve Loughran's avatar
      [SPARK-7481][BUILD] Add spark-hadoop-cloud module to pull in object store access. · 2cf83c47
      Steve Loughran authored
      ## What changes were proposed in this pull request?
      
      Add a new `spark-hadoop-cloud` module and maven profile to pull in object store support from `hadoop-openstack`, `hadoop-aws` and `hadoop-azure` (Hadoop 2.7+) JARs, along with their dependencies, fixing up the dependencies so that everything works, in particular Jackson.
      
      It restores `s3n://` access to S3, adds its `s3a://` replacement, OpenStack `swift://` and azure `wasb://`.
      
      There's a documentation page, `cloud_integration.md`, which covers the basic details of using Spark with object stores, referring the reader to the supplier's own documentation, with specific warnings on security and the possible mismatch between a store's behavior and that of a filesystem. In particular, users are advised be very cautious when trying to use an object store as the destination of data, and to consult the documentation of the storage supplier and the connector.
      
      (this is the successor to #12004; I can't re-open it)
      
      ## How was this patch tested?
      
      Downstream tests exist in [https://github.com/steveloughran/spark-cloud-examples/tree/master/cloud-examples](https://github.com/steveloughran/spark-cloud-examples/tree/master/cloud-examples)
      
      Those verify that the dependencies are sufficient to allow downstream applications to work with s3a, azure wasb and swift storage connectors, and perform basic IO & dataframe operations thereon. All seems well.
      
      Manually clean build & verify that assembly contains the relevant aws-* hadoop-* artifacts on Hadoop 2.6; azure on a hadoop-2.7 profile.
      
      SBT build: `build/sbt -Phadoop-cloud -Phadoop-2.7 package`
      maven build `mvn install -Phadoop-cloud -Phadoop-2.7`
      
      This PR *does not* update `dev/deps/spark-deps-hadoop-2.7` or `dev/deps/spark-deps-hadoop-2.6`, because unless the hadoop-cloud profile is enabled, no extra JARs show up in the dependency list. The dependency check in Jenkins isn't setting the property, so the new JARs aren't visible.
      
      Author: Steve Loughran <stevel@apache.org>
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #17834 from steveloughran/cloud/SPARK-7481-current.
      2cf83c47
  7. Apr 04, 2017
  8. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  9. Jan 24, 2017
    • uncleGen's avatar
      [DOCS] Fix typo in docs · 7c61c2a1
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      Fix typo in docs
      
      ## How was this patch tested?
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16658 from uncleGen/typo-issue.
      Unverified
      7c61c2a1
  10. Dec 10, 2016
  11. Nov 23, 2016
  12. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  13. Nov 03, 2016
    • Sean Owen's avatar
      [SPARK-18138][DOCS] Document that Java 7, Python 2.6, Scala 2.10, Hadoop < 2.6... · dc4c6009
      Sean Owen authored
      [SPARK-18138][DOCS] Document that Java 7, Python 2.6, Scala 2.10, Hadoop < 2.6 are deprecated in Spark 2.1.0
      
      ## What changes were proposed in this pull request?
      
      Document that Java 7, Python 2.6, Scala 2.10, Hadoop < 2.6 are deprecated in Spark 2.1.0. This does not actually implement any of the change in SPARK-18138, just peppers the documentation with notices about it.
      
      ## How was this patch tested?
      
      Doc build
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15733 from srowen/SPARK-18138.
      dc4c6009
  14. Sep 14, 2016
  15. Jul 15, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide · 5ffd5d38
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Made DataFrame-based API primary
      * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
      * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
      * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
        * **Reviewers: please check this carefully**
      * (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
      * Moved migration guide to ml-guide from mllib-guide
        * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
        * **Reviewers**: I did not change any of the content of the migration guides.
      
      Reorganized DataFrame-based guide:
      * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
      * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
        * **Reviewers**: I did not change the content of these guides, except some intro text.
      * Sidebar remains the same, but with pipeline and tuning sections added
      
      Other:
      * ml-classification-regression.html: Moved text about linear methods to new section in page
      
      ## How was this patch tested?
      
      Generated docs locally
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #14213 from jkbradley/ml-guide-2.0.
      5ffd5d38
  16. May 11, 2016
  17. Mar 18, 2016
    • Dongjoon Hyun's avatar
      [MINOR][DOCS] Update build descriptions and commands · c11ea2e4
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11838 from dongjoon-hyun/fix_doc_building_spark.
      c11ea2e4
  18. Jan 09, 2016
  19. Dec 08, 2015
  20. Nov 01, 2015
  21. Sep 13, 2015
  22. Jun 09, 2015
    • Patrick Wendell's avatar
      [SPARK-6511] [DOCUMENTATION] Explain how to use Hadoop provided builds · 6e4fb0c9
      Patrick Wendell authored
      This provides preliminary documentation pointing out how to use the
      Hadoop free builds. I am hoping over time this list can grow to
      include most of the popular Hadoop distributions.
      
      Getting more people using these builds will help us long term reduce
      the number of binaries we build.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6729 from pwendell/hadoop-provided and squashes the following commits:
      
      1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds
      6e4fb0c9
  23. Jun 07, 2015
    • Sean Owen's avatar
      [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+ · e84815dc
      Sean Owen authored
      Update build to use Java 7, and remove some comments and special-case support for Java 6.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6265 from srowen/SPARK-7733 and squashes the following commits:
      
      59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
      e84815dc
  24. May 29, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-6806] [SPARKR] [DOCS] Add a new SparkR programming guide · 5f48e5c3
      Shivaram Venkataraman authored
      This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.
      
      cc rxin davies pwendell
      
      cc cafreeman -- Would be great if you could also take a look at this !
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6490 from shivaram/sparkr-guide and squashes the following commits:
      
      d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
      408dce5 [Shivaram Venkataraman] Fix link
      dbb86e3 [Shivaram Venkataraman] Fix minor typo
      9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
      d09703c [Shivaram Venkataraman] Fix default argument in read.df
      ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better
      5f48e5c3
  25. May 23, 2015
    • Davies Liu's avatar
      [SPARK-6806] [SPARKR] [DOCS] Fill in SparkR examples in programming guide · 7af3818c
      Davies Liu authored
      sqlCtx -> sqlContext
      
      You can check the docs by:
      
      ```
      $ cd docs
      $ SKIP_SCALADOC=1 jekyll serve
      ```
      cc shivaram
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5442 from davies/r_docs and squashes the following commits:
      
      7a12ec6 [Davies Liu] remove rdd in R docs
      8496b26 [Davies Liu] remove the docs related to RDD
      e23b9d6 [Davies Liu] delete R docs for RDD API
      222e4ff [Davies Liu] Merge branch 'master' into r_docs
      89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
      f0a10e1 [Davies Liu] address comments from @shivaram
      f61de71 [Davies Liu] Update pairRDD.R
      3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
      2f10a77 [Davies Liu] address comments from @cafreeman
      9c2a062 [Davies Liu] mention R api together with Python API
      23f751a [Davies Liu] Fill in SparkR examples in programming guide
      7af3818c
  26. Mar 09, 2015
  27. Mar 02, 2015
    • Sean Owen's avatar
      SPARK-5390 [DOCS] Encourage users to post on Stack Overflow in Community Docs · 0b472f60
      Sean Owen authored
      Point "Community" to main Spark Community page; mention SO tag apache-spark.
      
      Separately, the Apache site can be updated to mention, under Mailing Lists:
      "StackOverflow also has an apache-spark tag for Spark Q&A." or similar.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4843 from srowen/SPARK-5390 and squashes the following commits:
      
      3508ac6 [Sean Owen] Point "Community" to main Spark Community page; mention SO tag apache-spark
      0b472f60
  28. Feb 05, 2015
    • Matei Zaharia's avatar
      [SPARK-5608] Improve SEO of Spark documentation pages · 4d74f060
      Matei Zaharia authored
      - Add meta description tags on some of the most important doc pages
      - Shorten the titles of some pages to have more relevant keywords; for
        example there's no reason to have "Spark SQL Programming Guide - Spark
        1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
        documentation".
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #4381 from mateiz/docs-seo and squashes the following commits:
      
      4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
      4d74f060
  29. Nov 09, 2014
    • Sean Owen's avatar
      SPARK-971 [DOCS] Link to Confluence wiki from project website / documentation · 8c99a47a
      Sean Owen authored
      This is a trivial change to add links to the wiki from `README.md` and the main docs page. It is already linked to from spark.apache.org.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3169 from srowen/SPARK-971 and squashes the following commits:
      
      dcb84d0 [Sean Owen] Add link to wiki from README, docs home page
      8c99a47a
  30. Sep 16, 2014
    • Evan Chan's avatar
      Add a Community Projects page · a6e1712f
      Evan Chan authored
      This adds a new page to the docs listing community projects -- those created outside of Apache Spark that are of interest to the community of Spark users.   Anybody can add to it just by submitting a PR.
      
      There was a discussion thread about alternatives:
      * Creating a Github organization for Spark projects -  we could not find any sponsors for this, and it would be difficult to organize since many folks just create repos in their company organization or personal accounts
      * Apache has some place for storing community projects, but it was deemed difficult to work with, and again would be some permissions issues -- not everyone could update it.
      
      Author: Evan Chan <velvia@gmail.com>
      
      Closes #2219 from velvia/community-projects-page and squashes the following commits:
      
      7316822 [Evan Chan] Point to Spark wiki: supplemental projects page
      613b021 [Evan Chan] Add a few more projects
      a85eaaf [Evan Chan] Add a Community Projects page
      a6e1712f
    • Sean Owen's avatar
      SPARK-3069 [DOCS] Build instructions in README are outdated · 61e21fe7
      Sean Owen authored
      Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.
      
      (Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2014 from srowen/SPARK-3069 and squashes the following commits:
      
      501507e [Sean Owen] Note that Zinc is for Maven builds too
      db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
      be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
      91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
      999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
      c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
      8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
      b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md
      61e21fe7
  31. Sep 07, 2014
    • Reynold Xin's avatar
      [SPARK-938][doc] Add OpenStack Swift support · eddfedda
      Reynold Xin authored
      See compiled doc at
      http://people.apache.org/~rxin/tmp/openstack-swift/_site/storage-openstack-swift.html
      
      This is based on #1010. Closes #1010.
      
      Author: Reynold Xin <rxin@apache.org>
      Author: Gil Vernik <gilv@il.ibm.com>
      
      Closes #2298 from rxin/openstack-swift and squashes the following commits:
      
      ff4e394 [Reynold Xin] Two minor comments from Patrick.
      279f6de [Reynold Xin] core-sites -> core-site
      dfb8fea [Reynold Xin] Updated based on Gil's suggestion.
      846f5cb [Reynold Xin] Added a link from overview page.
      0447c9f [Reynold Xin] Removed sample code.
      e9c3761 [Reynold Xin] Merge pull request #1010 from gilv/master
      9233fef [Gil Vernik] Fixed typos
      6994827 [Gil Vernik] Merge pull request #1 from rxin/openstack
      ac0679e [Reynold Xin] Fixed an unclosed tr.
      47ce99d [Reynold Xin] Merge branch 'master' into openstack
      cca7192 [Gil Vernik] Removed white spases from pom.xml
      99f095d [Reynold Xin] Pending openstack changes.
      eb22295 [Reynold Xin] Merge pull request #1010 from gilv/master
      39a9737 [Gil Vernik] Spark integration with Openstack Swift
      c977658 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark
      2aba763 [Gil Vernik] Fix to docs/openstack-integration.md
      9b625b5 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark
      eff538d [Gil Vernik] SPARK-938 - Openstack Swift object storage support
      ce483d7 [Gil Vernik] SPARK-938 - Openstack Swift object storage support
      b6c37ef [Gil Vernik] Openstack Swift support
      eddfedda
  32. Jun 25, 2014
  33. May 30, 2014
    • Matei Zaharia's avatar
      [SPARK-1566] consolidate programming guide, and general doc updates · c8bf4131
      Matei Zaharia authored
      This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
      
      * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
      * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
      * Spark-submit guide moved to a separate page and expanded slightly
      * Various cleanups of the menu system, security docs, and others
      * Updated look of title bar to differentiate the docs from previous Spark versions
      
      You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #896 from mateiz/1.0-docs and squashes the following commits:
      
      03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
      0779508 [Matei Zaharia] tweak
      ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
      1bf4112 [Matei Zaharia] Review comments
      4414f88 [Matei Zaharia] tweaks
      d04e979 [Matei Zaharia] Fix some old links to Java guide
      a34ed33 [Matei Zaharia] tweak
      541bb3b [Matei Zaharia] miscellaneous changes
      fcefdec [Matei Zaharia] Moved submitting apps to separate doc
      61d72b4 [Matei Zaharia] stuff
      181f217 [Matei Zaharia] migration guide, remove old language guides
      e11a0da [Matei Zaharia] Add more API functions
      6a030a9 [Matei Zaharia] tweaks
      8db0ae3 [Matei Zaharia] Added key-value pairs section
      318d2c9 [Matei Zaharia] tweaks
      1c81477 [Matei Zaharia] New section on basics and function syntax
      e38f559 [Matei Zaharia] Actually added programming guide to Git
      a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
      3b6a876 [Matei Zaharia] More CSS tweaks
      01ec8bf [Matei Zaharia] More CSS tweaks
      e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
      c8bf4131
  34. May 17, 2014
    • Andrew Or's avatar
      [SPARK-1824] Remove <master> from Python examples · cf6cbe9f
      Andrew Or authored
      A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too.
      
      Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example,
      
      ```
      bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512]
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #802 from andrewor14/python-examples and squashes the following commits:
      
      cf50b9f [Andrew Or] De-indent python comments (minor)
      50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction
      c362f69 [Andrew Or] Update docs to use spark-submit for python applications
      7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples
      427a5f0 [Andrew Or] Update docs
      d32072c [Andrew Or] Remove <master> from examples + update usages
      cf6cbe9f
  35. May 12, 2014
    • Andrew Or's avatar
      [SPARK-1753 / 1773 / 1814] Update outdated docs for spark-submit, YARN, standalone etc. · 2ffd1eaf
      Andrew Or authored
      YARN
      - SparkPi was updated to not take in master as an argument; we should update the docs to reflect that.
      - The default YARN build guide should be in maven, not sbt.
      - This PR also adds a paragraph on steps to debug a YARN application.
      
      Standalone
      - Emphasize spark-submit more. Right now it's one small paragraph preceding the legacy way of launching through `org.apache.spark.deploy.Client`.
      - The way we set configurations / environment variables according to the old docs is outdated. This needs to reflect changes introduced by the Spark configuration changes we made.
      
      In general, this PR also adds a little more documentation on the new spark-shell, spark-submit, spark-defaults.conf etc here and there.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #701 from andrewor14/yarn-docs and squashes the following commits:
      
      e2c2312 [Andrew Or] Merge in changes in #752 (SPARK-1814)
      25cfe7b [Andrew Or] Merge in the warning from SPARK-1753
      a8c39c5 [Andrew Or] Minor changes
      336bbd9 [Andrew Or] Tabs -> spaces
      4d9d8f7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      041017a [Andrew Or] Abstract Spark submit documentation to cluster-overview.html
      3cc0649 [Andrew Or] Detail how to set configurations + remove legacy instructions
      5b7140a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      85a51fc [Andrew Or] Update run-example, spark-shell, configuration etc.
      c10e8c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      381fe32 [Andrew Or] Update docs for standalone mode
      757c184 [Andrew Or] Add a note about the requirements for the debugging trick
      f8ca990 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      924f04c [Andrew Or] Revert addition of --deploy-mode
      d5fe17b [Andrew Or] Update the YARN docs
      2ffd1eaf
  36. May 06, 2014
    • Sandeep's avatar
      SPARK-1637: Clean up examples for 1.0 · a000b5c3
      Sandeep authored
      - [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
      - [x] Move Python examples into examples/src/main/python
      - [x] Update docs to reflect these changes
      
      Author: Sandeep <sandeep@techaddict.me>
      
      This patch had conflicts when merged, resolved by
      Committer: Matei Zaharia <matei@databricks.com>
      
      Closes #571 from techaddict/SPARK-1637 and squashes the following commits:
      
      47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
      8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
      5f96121 [Sandeep] Move Python examples into examples/src/main/python
      0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
      a000b5c3
    • Patrick Wendell's avatar
      Fix two download suggestions in the docs: · 7b978c1a
      Patrick Wendell authored
      1) On the quick start page provide a direct link to the downloads (suggested by @pbailis).
      2) On the index page, don't suggest users always have to build Spark, since many won't.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #662 from pwendell/quick-start and squashes the following commits:
      
      0622f27 [Patrick Wendell] Fix two download suggestions in the docs:
      7b978c1a
  37. May 05, 2014
    • Tathagata Das's avatar
      [SPARK-1504], [SPARK-1505], [SPARK-1558] Updated Spark Streaming guide · a975a19f
      Tathagata Das authored
      - SPARK-1558: Updated custom receiver guide to match it with the new API
      - SPARK-1504: Added deployment and monitoring subsection to streaming
      - SPARK-1505: Added migration guide for migrating from 0.9.x and below to Spark 1.0
      - Updated various Java streaming examples to use JavaReceiverInputDStream to highlight the API change.
      - Removed the requirement for cleaner ttl from streaming guide
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #652 from tdas/doc-fix and squashes the following commits:
      
      cb4f4b7 [Tathagata Das] Possible fix for flaky graceful shutdown test.
      ab71f7f [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into doc-fix
      8d6ff9b [Tathagata Das] Addded migration guide to Spark Streaming.
      7d171df [Tathagata Das] Added reference to JavaReceiverInputStream in examples and streaming guide.
      49edd7c [Tathagata Das] Change java doc links to use Java docs.
      11528d7 [Tathagata Das] Updated links on index page.
      ff80970 [Tathagata Das] More updates to streaming guide.
      4dc42e9 [Tathagata Das] Added monitoring and other documentation in the streaming guide.
      14c6564 [Tathagata Das] Updated custom receiver guide.
      a975a19f
  38. Apr 21, 2014
    • Matei Zaharia's avatar
      [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs · fc783847
      Matei Zaharia authored
      I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.
      
      Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #457 from mateiz/better-docs and squashes the following commits:
      
      a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
      5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
      f05abc0 [Matei Zaharia] Don't include java.lang package names
      995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
      a14a93c [Matei Zaharia] typo
      76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
      ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
      acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
      fc783847
Loading