Skip to content
Snippets Groups Projects
  1. Feb 28, 2017
    • Michael McCune's avatar
      [SPARK-19769][DOCS] Update quickstart instructions · d887f758
      Michael McCune authored
      
      ## What changes were proposed in this pull request?
      
      This change addresses the renaming of the `simple.sbt` build file to
      `build.sbt`. Newer versions of the sbt tool are not finding the older
      named file and are looking for the `build.sbt`. The quickstart
      instructions for self-contained applications is updated with this
      change.
      
      ## How was this patch tested?
      
      As this is a relatively minor change of a few words, the markdown was checked for syntax and spelling. Site was built with `SKIP_API=1 jekyll serve` for testing purposes.
      
      Author: Michael McCune <msm@redhat.com>
      
      Closes #17101 from elmiko/spark-19769.
      
      (cherry picked from commit bf5987cb)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      d887f758
  2. Nov 30, 2016
  3. Oct 07, 2016
    • Sean Owen's avatar
      [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished · cff56075
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daemon threads. It should otherwise result in exactly the same configuration, because the other args are copied from the constructor that is currently called.
      
      (I'm not sure we should change the Hive Thriftserver impl, but I did anyway.)
      
      This also adds `sc.stop()` to the quick start guide example.
      
      ## How was this patch tested?
      
      Existing tests; _pending_ at least manual verification of the fix.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15381 from srowen/SPARK-17707.
      cff56075
  4. Aug 30, 2016
    • Dmitriy Sokolov's avatar
      [MINOR][DOCS] Fix minor typos in python example code · d4eee993
      Dmitriy Sokolov authored
      ## What changes were proposed in this pull request?
      
      Fix minor typos python example code in streaming programming guide
      
      ## How was this patch tested?
      
      N/A
      
      Author: Dmitriy Sokolov <silentsokolov@gmail.com>
      
      Closes #14805 from silentsokolov/fix-typos.
      d4eee993
  5. Aug 16, 2016
    • linbojin's avatar
      [MINOR][DOC] Correct code snippet results in quick start documentation · 6f0988b1
      linbojin authored
      ## What changes were proposed in this pull request?
      
      As README.md file is updated over time. Some code snippet outputs are not correct based on new README.md file. For example:
      ```
      scala> textFile.count()
      res0: Long = 126
      ```
      should be
      ```
      scala> textFile.count()
      res0: Long = 99
      ```
      This pr is to add comments to point out this problem so that new spark learners have a correct reference.
      Also, fixed a samll bug, inside current documentation, the outputs of linesWithSpark.count() without and with cache are different (one is 15 and the other is 19)
      ```
      scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
      linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:27
      
      scala> textFile.filter(line => line.contains("Spark")).count() // How many lines contain "Spark"?
      res3: Long = 15
      
      ...
      
      scala> linesWithSpark.cache()
      res7: linesWithSpark.type = MapPartitionsRDD[2] at filter at <console>:27
      
      scala> linesWithSpark.count()
      res8: Long = 19
      ```
      
      ## How was this patch tested?
      
      manual test:  run `$ SKIP_API=1 jekyll serve --watch`
      
      Author: linbojin <linbojin203@gmail.com>
      
      Closes #14645 from linbojin/quick-start-documentation.
      6f0988b1
  6. Jun 08, 2016
    • prabs's avatar
      [DOCUMENTATION] Fixed target JAR path · ca70ab27
      prabs authored
      ## What changes were proposed in this pull request?
      
      Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar`
      
      ## How was this patch tested?
      
      n/a
      
      Author: prabs <prabsmails@gmail.com>
      Author: Prabeesh K <prabsmails@gmail.com>
      
      Closes #13554 from prabeesh/master.
      ca70ab27
  7. May 03, 2016
  8. Sep 08, 2015
  9. Jul 31, 2015
    • Sean Owen's avatar
      [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code... · 873ab0f9
      Sean Owen authored
      [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement
      
      Use print(x) not print x for Python 3 in eval examples
      CC sethah mengxr -- just wanted to close this out before 1.5
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #7822 from srowen/SPARK-9490 and squashes the following commits:
      
      01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too
      bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
      873ab0f9
  10. May 23, 2015
    • Davies Liu's avatar
      [SPARK-6806] [SPARKR] [DOCS] Fill in SparkR examples in programming guide · 7af3818c
      Davies Liu authored
      sqlCtx -> sqlContext
      
      You can check the docs by:
      
      ```
      $ cd docs
      $ SKIP_SCALADOC=1 jekyll serve
      ```
      cc shivaram
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5442 from davies/r_docs and squashes the following commits:
      
      7a12ec6 [Davies Liu] remove rdd in R docs
      8496b26 [Davies Liu] remove the docs related to RDD
      e23b9d6 [Davies Liu] delete R docs for RDD API
      222e4ff [Davies Liu] Merge branch 'master' into r_docs
      89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
      f0a10e1 [Davies Liu] address comments from @shivaram
      f61de71 [Davies Liu] Update pairRDD.R
      3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
      2f10a77 [Davies Liu] address comments from @cafreeman
      9c2a062 [Davies Liu] mention R api together with Python API
      23f751a [Davies Liu] Fill in SparkR examples in programming guide
      7af3818c
  11. Feb 05, 2015
    • Matei Zaharia's avatar
      [SPARK-5608] Improve SEO of Spark documentation pages · 4d74f060
      Matei Zaharia authored
      - Add meta description tags on some of the most important doc pages
      - Shorten the titles of some pages to have more relevant keywords; for
        example there's no reason to have "Spark SQL Programming Guide - Spark
        1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
        documentation".
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #4381 from mateiz/docs-seo and squashes the following commits:
      
      4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
      4d74f060
  12. Nov 27, 2014
    • Sean Owen's avatar
      SPARK-4170 [CORE] Closure problems when running Scala app that "extends App" · 5d7fe178
      Sean Owen authored
      Warn against subclassing scala.App, and remove one instance of this in examples
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3497 from srowen/SPARK-4170 and squashes the following commits:
      
      4a6131f [Sean Owen] Restore multiline string formatting
      a8ca895 [Sean Owen] Warn against subclassing scala.App, and remove one instance of this in examples
      5d7fe178
  13. Oct 14, 2014
    • Sean Owen's avatar
      SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark Application · 18ab6bd7
      Sean Owen authored
      HT to Diana, just proposing an implementation of her suggestion, which I rather agreed with. Is there a second/third for the motion?
      
      Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2787 from srowen/SPARK-1307 and squashes the following commits:
      
      b5b82e2 [Sean Owen] Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      18ab6bd7
  14. Jun 22, 2014
    • Sean Owen's avatar
      SPARK-1996. Remove use of special Maven repo for Akka · 1db9cbc3
      Sean Owen authored
      Just following up Matei's suggestion to remove the Akka repo references. Builds and the audit-release script appear OK.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1170 from srowen/SPARK-1996 and squashes the following commits:
      
      5ca2930 [Sean Owen] Remove outdated Akka repository references
      1db9cbc3
  15. May 30, 2014
    • Matei Zaharia's avatar
      [SPARK-1566] consolidate programming guide, and general doc updates · c8bf4131
      Matei Zaharia authored
      This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
      
      * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
      * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
      * Spark-submit guide moved to a separate page and expanded slightly
      * Various cleanups of the menu system, security docs, and others
      * Updated look of title bar to differentiate the docs from previous Spark versions
      
      You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #896 from mateiz/1.0-docs and squashes the following commits:
      
      03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
      0779508 [Matei Zaharia] tweak
      ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
      1bf4112 [Matei Zaharia] Review comments
      4414f88 [Matei Zaharia] tweaks
      d04e979 [Matei Zaharia] Fix some old links to Java guide
      a34ed33 [Matei Zaharia] tweak
      541bb3b [Matei Zaharia] miscellaneous changes
      fcefdec [Matei Zaharia] Moved submitting apps to separate doc
      61d72b4 [Matei Zaharia] stuff
      181f217 [Matei Zaharia] migration guide, remove old language guides
      e11a0da [Matei Zaharia] Add more API functions
      6a030a9 [Matei Zaharia] tweaks
      8db0ae3 [Matei Zaharia] Added key-value pairs section
      318d2c9 [Matei Zaharia] tweaks
      1c81477 [Matei Zaharia] New section on basics and function syntax
      e38f559 [Matei Zaharia] Actually added programming guide to Git
      a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
      3b6a876 [Matei Zaharia] More CSS tweaks
      01ec8bf [Matei Zaharia] More CSS tweaks
      e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
      c8bf4131
  16. May 28, 2014
    • Patrick Wendell's avatar
      Organize configuration docs · 7801d44f
      Patrick Wendell authored
      This PR improves and organizes the config option page
      and makes a few other changes to config docs. See a preview here:
      http://people.apache.org/~pwendell/config-improvements/configuration.html
      
      The biggest changes are:
      1. The configs for the standalone master/workers were moved to the
      standalone page and out of the general config doc.
      2. SPARK_LOCAL_DIRS was missing from the standalone docs.
      3. Expanded discussion of injecting configs with spark-submit, including an
      example.
      4. Config options were organized into the following categories:
      - Runtime Environment
      - Shuffle Behavior
      - Spark UI
      - Compression and Serialization
      - Execution Behavior
      - Networking
      - Scheduling
      - Security
      - Spark Streaming
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #880 from pwendell/config-cleanup and squashes the following commits:
      
      93f56c3 [Patrick Wendell] Feedback from Matei
      6f66efc [Patrick Wendell] More feedback
      16ae776 [Patrick Wendell] Adding back header section
      d9c264f [Patrick Wendell] Small fix
      e0c1728 [Patrick Wendell] Response to Matei's review
      27d57db [Patrick Wendell] Reverting changes to index.html (covered in #896)
      e230ef9 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a374369 [Patrick Wendell] Line wrapping fixes
      fdff7fc [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      3289ea4 [Patrick Wendell] Pulling in changes from #856
      106ee31 [Patrick Wendell] Small link fix
      f7e79bc [Patrick Wendell] Re-organizing config options.
      54b184d [Patrick Wendell] Adding standalone configs to the standalone page
      592e94a [Patrick Wendell] Stash
      29b5446 [Patrick Wendell] Better discussion of spark-submit in configuration docs
      2d719ef [Patrick Wendell] Small fix
      4af9e07 [Patrick Wendell] Adding SPARK_LOCAL_DIRS docs
      204b248 [Patrick Wendell] Small fixes
      7801d44f
  17. May 14, 2014
    • Matei Zaharia's avatar
      Add language tabs and Python version to interactive part of quick-start · f10de042
      Matei Zaharia authored
      This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567. I've also updated the doc to show submitting the Python application with spark-submit.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #782 from mateiz/spark-1567-extra and squashes the following commits:
      
      6f8f2aa [Matei Zaharia] tweaks
      9ed9874 [Matei Zaharia] tweaks
      ae67c3e [Matei Zaharia] tweak
      b303ba3 [Matei Zaharia] tweak
      1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start guide
      f10de042
  18. May 12, 2014
    • Andrew Or's avatar
      [SPARK-1753 / 1773 / 1814] Update outdated docs for spark-submit, YARN, standalone etc. · 2ffd1eaf
      Andrew Or authored
      YARN
      - SparkPi was updated to not take in master as an argument; we should update the docs to reflect that.
      - The default YARN build guide should be in maven, not sbt.
      - This PR also adds a paragraph on steps to debug a YARN application.
      
      Standalone
      - Emphasize spark-submit more. Right now it's one small paragraph preceding the legacy way of launching through `org.apache.spark.deploy.Client`.
      - The way we set configurations / environment variables according to the old docs is outdated. This needs to reflect changes introduced by the Spark configuration changes we made.
      
      In general, this PR also adds a little more documentation on the new spark-shell, spark-submit, spark-defaults.conf etc here and there.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #701 from andrewor14/yarn-docs and squashes the following commits:
      
      e2c2312 [Andrew Or] Merge in changes in #752 (SPARK-1814)
      25cfe7b [Andrew Or] Merge in the warning from SPARK-1753
      a8c39c5 [Andrew Or] Minor changes
      336bbd9 [Andrew Or] Tabs -> spaces
      4d9d8f7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      041017a [Andrew Or] Abstract Spark submit documentation to cluster-overview.html
      3cc0649 [Andrew Or] Detail how to set configurations + remove legacy instructions
      5b7140a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      85a51fc [Andrew Or] Update run-example, spark-shell, configuration etc.
      c10e8c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      381fe32 [Andrew Or] Update docs for standalone mode
      757c184 [Andrew Or] Add a note about the requirements for the debugging trick
      f8ca990 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      924f04c [Andrew Or] Revert addition of --deploy-mode
      d5fe17b [Andrew Or] Update the YARN docs
      2ffd1eaf
  19. May 06, 2014
    • Patrick Wendell's avatar
      Fix two download suggestions in the docs: · 7b978c1a
      Patrick Wendell authored
      1) On the quick start page provide a direct link to the downloads (suggested by @pbailis).
      2) On the index page, don't suggest users always have to build Spark, since many won't.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #662 from pwendell/quick-start and squashes the following commits:
      
      0622f27 [Patrick Wendell] Fix two download suggestions in the docs:
      7b978c1a
  20. Apr 26, 2014
    • Patrick Wendell's avatar
      SPARK-1606: Infer user application arguments instead of requiring --arg. · aa9a7f5d
      Patrick Wendell authored
      This modifies spark-submit to do something more like the Hadoop `jar`
      command. Now we have the following syntax:
      
      ./bin/spark-submit [options] user.jar [user options]
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #563 from pwendell/spark-submit and squashes the following commits:
      
      32241fc [Patrick Wendell] Review feedback
      3adfb69 [Patrick Wendell] Small fix
      bc48139 [Patrick Wendell] SPARK-1606: Infer user application arguments instead of requiring --arg.
      aa9a7f5d
  21. Apr 21, 2014
    • Matei Zaharia's avatar
      [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs · fc783847
      Matei Zaharia authored
      I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.
      
      Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #457 from mateiz/better-docs and squashes the following commits:
      
      a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
      5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
      f05abc0 [Matei Zaharia] Don't include java.lang package names
      995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
      a14a93c [Matei Zaharia] typo
      76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
      ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
      acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
      fc783847
    • Patrick Wendell's avatar
      Clean up and simplify Spark configuration · fb98488f
      Patrick Wendell authored
      Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
      
      1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
      2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
      3. Adds ability to set these same variables for the driver using `spark-submit`.
      4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
      5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #299 from pwendell/config-cleanup and squashes the following commits:
      
      127f301 [Patrick Wendell] Improvements to testing
      a006464 [Patrick Wendell] Moving properties file template.
      b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
      0086939 [Patrick Wendell] Minor style fixes
      af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
      b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
      af0adf7 [Patrick Wendell] Automatically add user jar
      a56b125 [Patrick Wendell] Responses to Tom's review
      d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a762901 [Patrick Wendell] Fixing test failures
      ffa00fe [Patrick Wendell] Review feedback
      fda0301 [Patrick Wendell] Note
      308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
      e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
      be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
      c2a2909 [Patrick Wendell] Test compile fixes
      4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
      afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
      b08893b [Patrick Wendell] Additional improvements.
      ace4ead [Patrick Wendell] Responses to review feedback.
      b72d183 [Patrick Wendell] Review feedback for spark env file
      46555c1 [Patrick Wendell] Review feedback and import clean-ups
      437aed1 [Patrick Wendell] Small fix
      761ebcd [Patrick Wendell] Library path and classpath for drivers
      7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
      5b0ba8e [Patrick Wendell] Don't ship executor envs
      84cc5e5 [Patrick Wendell] Small clean-up
      1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
      4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
      6eaf7d0 [Patrick Wendell] executorJavaOpts
      0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
      ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
      fb98488f
  22. Apr 04, 2014
    • Prabeesh K's avatar
      small fix ( proogram -> program ) · 0acc7a02
      Prabeesh K authored
      Author: Prabeesh K <prabsmails@gmail.com>
      
      Closes #331 from prabeesh/patch-3 and squashes the following commits:
      
      9399eb5 [Prabeesh K] small fix(proogram -> program)
      0acc7a02
  23. Feb 19, 2014
  24. Jan 06, 2014
  25. Jan 02, 2014
  26. Dec 30, 2013
  27. Sep 08, 2013
  28. Sep 01, 2013
  29. Aug 31, 2013
  30. Aug 30, 2013
  31. Aug 29, 2013
  32. Apr 25, 2013
  33. Apr 12, 2013
Loading