- Feb 01, 2017
-
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? Fix brokens links in ml-pipeline and ml-tuning `<div data-lang="scala">` -> `<div data-lang="scala" markdown="1">` ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #16754 from zhengruifeng/doc_api_fix. (cherry picked from commit 04ee8cf6) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Jan 30, 2017
-
-
gatorsmile authored
### What changes were proposed in this pull request? The case are not sensitive in JDBC options, after the PR https://github.com/apache/spark/pull/15884 is merged to Spark 2.1. ### How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> Closes #16734 from gatorsmile/fixDocCaseInsensitive. (cherry picked from commit c0eda7e8) Signed-off-by:
gatorsmile <gatorsmile@gmail.com>
-
- Jan 25, 2017
-
-
aokolnychyi authored
## What changes were proposed in this pull request? - A separate subsection for Aggregations under “Getting Started” in the Spark SQL programming guide. It mentions which aggregate functions are predefined and how users can create their own. - Examples of using the `UserDefinedAggregateFunction` abstract class for untyped aggregations in Java and Scala. - Examples of using the `Aggregator` abstract class for type-safe aggregations in Java and Scala. - Python is not covered. - The PR might not resolve the ticket since I do not know what exactly was planned by the author. In total, there are four new standalone examples that can be executed via `spark-submit` or `run-example`. The updated Spark SQL programming guide references to these examples and does not contain hard-coded snippets. ## How was this patch tested? The patch was tested locally by building the docs. The examples were run as well.  Author: aokolnychyi <okolnychyyanton@gmail.com> Closes #16329 from aokolnychyi/SPARK-16046. (cherry picked from commit 3fdce814) Signed-off-by:
gatorsmile <gatorsmile@gmail.com>
-
- Jan 10, 2017
-
-
Shixiong Zhu authored
## What changes were proposed in this pull request? This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixiong@databricks.com> Closes #16520 from zsxwing/update-without-agg. (cherry picked from commit bc6c56e9) Signed-off-by:
Shixiong Zhu <shixiong@databricks.com>
-
- Jan 07, 2017
-
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? This PR adds a new behavior change description on `CREATE TABLE ... LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276). ## How was this patch tested? ``` SKIP_API=1 jekyll build ``` **Newly Added Description** <img width="913" alt="new" src="https://cloud.githubusercontent.com/assets/9700541/21743606/7efe2b12-d4ba-11e6-8a0d-551222718ea2.png "> Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16400 from dongjoon-hyun/SPARK-18941. (cherry picked from commit 923e5948) Signed-off-by:
gatorsmile <gatorsmile@gmail.com>
-
Sean Owen authored
configuration.html section headings were not specified correctly in markdown and weren't rendering, being recognized correctly. Removed extra p tags and pulled level 4 titles up to level 3, since level 3 had been skipped. This improves the TOC. Doc build, manual check. Author: Sean Owen <sowen@cloudera.com> Closes #16490 from srowen/SPARK-19106. (cherry picked from commit 54138f6e) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Jan 06, 2017
-
-
Tathagata Das authored
[SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options ## What changes were proposed in this pull request? Updates - Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure. - Updated Output Modes section with Update mode - Added options for all the sources and sinks --------------------------- ---------------------------  --------------------------- --------------------------- <img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png"> <img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png"> --------------------------- ---------------------------    Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #16468 from tdas/SPARK-19074. (cherry picked from commit b59cddab) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
jerryshao authored
## What changes were proposed in this pull request? Current HistoryServer's ACLs is derived from application event-log, which means the newly changed ACLs cannot be applied to the old data, this will become a problem where newly added admin cannot access the old application history UI, only the new application can be affected. So here propose to add admin ACLs for history server, any configured user/group could have the view access to all the applications, while the view ACLs derived from application run-time still take effect. ## How was this patch tested? Unit test added. Author: jerryshao <sshao@hortonworks.com> Closes #16470 from jerryshao/SPARK-19033. (cherry picked from commit 4a4c3dc9) Signed-off-by:
Tom Graves <tgraves@yahoo-inc.com>
-
- Jan 02, 2017
-
-
Liwei Lin authored
## What changes were proposed in this pull request? Currently some code snippets in the programming guide just do not compile. We should fix them. ## How was this patch tested? ``` SKIP_API=1 jekyll build ``` ## Screenshot from part of the change:  Author: Liwei Lin <lwlin7@gmail.com> Closes #16442 from lw-lin/ss-pro-guide-.
-
Liang-Chi Hsieh authored
## What changes were proposed in this pull request? The configuration `spark.yarn.security.tokens.{service}.enabled` is deprecated. Now we should use `spark.yarn.security.credentials.{service}.enabled`. Some places in the doc is not updated yet. ## How was this patch tested? N/A. Just doc change. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #16444 from viirya/minor-credential-provider-doc. (cherry picked from commit 0ac2f1e7) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 30, 2016
-
-
Cheng Lian authored
This PR documents the scalable partition handling feature in the body of the programming guide. Before this PR, we only mention it in the migration guide. It's not super clear that external datasource tables require an extra `MSCK REPAIR TABLE` command is to have per-partition information persisted since 2.1. N/A. Author: Cheng Lian <lian@databricks.com> Closes #16424 from liancheng/scalable-partition-handling-doc. (cherry picked from commit 871f6114) Signed-off-by:
Cheng Lian <lian@databricks.com>
-
- Dec 29, 2016
-
-
adesharatushar authored
[SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design Patterns for using foreachRDD ## What changes were proposed in this pull request? Added missing Java example under section "Design Patterns for using foreachRDD". Now this section has examples in all 3 languages, improving consistency of documentation. ## How was this patch tested? Manual. Generated docs using command "SKIP_API=1 jekyll build" and verified generated HTML page manually. The syntax of example has been tested for correctness using sample code on Java1.7 and Spark 2.2.0-SNAPSHOT. Author: adesharatushar <tushar_adeshara@persistent.com> Closes #16408 from adesharatushar/streaming-doc-fix. (cherry picked from commit dba81e1d) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 28, 2016
-
-
Tathagata Das authored
[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #16294 from tdas/SPARK-18669. (cherry picked from commit 092c6725) Signed-off-by:
Shixiong Zhu <shixiong@databricks.com>
-
- Dec 20, 2016
-
-
Josh Rosen authored
## What changes were proposed in this pull request? Spark's current task cancellation / task killing mechanism is "best effort" because some tasks may not be interruptible or may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources that are being used by these zombie tasks. This patch aims to address this problem by adding a "task reaper" mechanism to executors. At a high-level, task killing now launches a new thread which attempts to kill the task and then watches the task and periodically checks whether it has been killed. The TaskReaper will periodically re-attempt to call `TaskRunner.kill()` and will log warnings if the task keeps running. I modified TaskRunner to rename its thread at the start of the task, allowing TaskReaper to take a thread dump and filter it in order to log stacktraces from the exact task thread that we are waiting to finish. If the task has not stopped after a configurable timeout then the TaskReaper will throw an exception to trigger executor JVM death, thereby forcibly freeing any resources consumed by the zombie tasks. This feature is flagged off by default and is controlled by four new configurations under the `spark.task.reaper.*` namespace. See the updated `configuration.md` doc for details. ## How was this patch tested? Tested via a new test case in `JobCancellationSuite`, plus manual testing. Author: Josh Rosen <joshrosen@databricks.com> Closes #16189 from JoshRosen/cancellation.
-
- Dec 18, 2016
-
-
gatorsmile authored
### What changes were proposed in this pull request? The configuration page looks messy now, as shown in the nightly build: https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/configuration.html Starting from the following location:  ### How was this patch tested? Attached is the screenshot generated in my local computer after the fix. [Configuration - Spark 2.2.0 Documentation.pdf](https://github.com/apache/spark/files/659315/Configuration.-.Spark.2.2.0.Documentation.pdf ) Author: gatorsmile <gatorsmile@gmail.com> Closes #16327 from gatorsmile/docFix. (cherry picked from commit c0c9e1d2) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 17, 2016
-
-
Felix Cheung authored
## What changes were proposed in this pull request? Reorganizing content (copy/paste) ## How was this patch tested? https://felixcheung.github.io/sparkr-vignettes.html Previous: https://felixcheung.github.io/sparkr-vignettes_old.html Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16301 from felixcheung/rvignettespass2. (cherry picked from commit 38fd163d) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Dec 15, 2016
-
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Dec 14, 2016
-
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? Since Apache Spark 1.4.0, R API document page has a broken link on `DESCRIPTION file` because Jekyll plugin script doesn't copy the file. This PR aims to fix that. - Official Latest Website: http://spark.apache.org/docs/latest/api/R/index.html - Apache Spark 2.1.0-rc2: http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/api/R/index.html ## How was this patch tested? Manual. ```bash cd docs SKIP_SCALADOC=1 jekyll build ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16292 from dongjoon-hyun/SPARK-18875. (cherry picked from commit ec0eae48) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Dec 12, 2016
-
-
Bill Chambers authored
## What changes were proposed in this pull request? This PR clarifies where accumulators will be displayed. ## How was this patch tested? No testing. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Bill Chambers <bill@databricks.com> Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #16180 from anabranch/improve-acc-docs. (cherry picked from commit 70ffff21) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 10, 2016
-
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? According to the notice of the following Wiki front page, we can remove the obsolete wiki pointer safely in `README.md` and `docs/index.md`, too. These two lines are the last occurrence of that links. ``` All current wiki content has been merged into pages at http://spark.apache.org as of November 2016. Each page links to the new location of its information on the Spark web site. Obsolete wiki content is still hosted here, but carries a notice that it is no longer current. ``` ## How was this patch tested? Manual. - `README.md`: https://github.com/dongjoon-hyun/spark/tree/remove_wiki_from_readme - `docs/index.md`: ``` cd docs SKIP_API=1 jekyll build ```  Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16239 from dongjoon-hyun/remove_wiki_from_readme. (cherry picked from commit f3a3fed7) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 09, 2016
-
-
Xiangrui Meng authored
## What changes were proposed in this pull request? There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion. I check the [Spark FAQ page](http://spark.apache.org/faq.html ), which seems too high-level for the content here. So I added it to the MLlib user guide instead. cc: mateiz Author: Xiangrui Meng <meng@databricks.com> Closes #16241 from mengxr/SPARK-18812. (cherry picked from commit d2493a20) Signed-off-by:
Xiangrui Meng <meng@databricks.com>
-
Jacek Laskowski authored
## What changes were proposed in this pull request? Typo fixes ## How was this patch tested? Local build. Awaiting the official build. Author: Jacek Laskowski <jacek@japila.pl> Closes #16144 from jaceklaskowski/typo-fixes. (cherry picked from commit b162cc0c) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 08, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? * Add all R examples for ML wrappers which were added during 2.1 release cycle. * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them. * Add corresponding examples to ML user guide. * Update ML section of SparkR user guide. Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```. ## How was this patch tested? Run all examples manually. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16148 from yanboliang/spark-18325. (cherry picked from commit 9bf8f3cd) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Dec 07, 2016
-
-
sethah authored
## What changes were proposed in this pull request? WeightedLeastSquares now supports L1 and elastic net penalties and has an additional solver option: QuasiNewton. The docs are updated to reflect this change. ## How was this patch tested? Docs only. Generated documentation to make sure Latex looks ok. Author: sethah <seth.hendrickson16@gmail.com> Closes #16139 from sethah/SPARK-18705. (cherry picked from commit 82253617) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? Logistic Regression summary is added in Python API. We need to add example and document for summary. The newly added example is consistent with Scala and Java examples. ## How was this patch tested? Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16064 from wangmiao1981/py. (cherry picked from commit aad11209) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
- Dec 05, 2016
-
-
Nicholas Chammas authored
Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`. This change is so minor I'm not sure it needs a JIRA. But let me know if so and I'll create one. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #16130 from nchammas/yarn-doc-fix. (cherry picked from commit 5a92dc76) Signed-off-by:
Marcelo Vanzin <vanzin@cloudera.com>
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? In `SQL Programming Guide`, this PR uses `TRUE` instead of `True` in SparkR and adds default values of `nullable` for `StructField` in Scala/Python/R (i.e., "Note: The default value of nullable is true."). In Java API, `nullable` is not optional. **BEFORE** * SPARK 2.1.0 RC1 http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-docs/sql-programming-guide.html#data-types **AFTER** * R <img width="916" alt="screen shot 2016-12-04 at 11 58 19 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877443/abba19a6-ba7d-11e6-8984-afbe00333fb0.png"> * Scala <img width="914" alt="screen shot 2016-12-04 at 11 57 37 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877433/99ce734a-ba7d-11e6-8bb5-e8619041b09b.png"> * Python <img width="914" alt="screen shot 2016-12-04 at 11 58 04 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877440/a5c89338-ba7d-11e6-8f92-6c0ae9388d7e.png "> ## How was this patch tested? Manual. ``` cd docs SKIP_API=1 jekyll build open _site/index.html ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16141 from dongjoon-hyun/SPARK-SQL-GUIDE. (cherry picked from commit 410b7898) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
Yanbo Liang authored
## What changes were proposed in this pull request? Add R examples to ML programming guide for the following algorithms as POC: * spark.glm * spark.survreg * spark.naiveBayes * spark.kmeans The four algorithms were added to SparkR since 2.0.0, more docs for algorithms added during 2.1 release cycle will be addressed in a separate follow-up PR. ## How was this patch tested? This is the screenshots of generated ML programming guide for ```GeneralizedLinearRegression```:  Author: Yanbo Liang <ybliang8@gmail.com> Closes #16136 from yanboliang/spark-18279. (cherry picked from commit eb8dd681) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
- Dec 04, 2016
-
-
Felix Cheung authored
## What changes were proposed in this pull request? If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session. This seems to be a regression on the earlier behavior. Fix is to always try to install or check for the cached Spark if running in an interactive session. As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc) ## How was this patch tested? Manually Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16077 from felixcheung/rsessioninteractive. (cherry picked from commit b019b3a8) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Dec 03, 2016
-
-
Yunni authored
## What changes were proposed in this pull request? The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples. ## How was this patch tested? Doc has been generated through Jekyll, and checked through manual inspection. Author: Yunni <Euler57721@gmail.com> Author: Yun Ni <yunn@uber.com> Author: Joseph K. Bradley <joseph@databricks.com> Author: Yun Ni <Euler57721@gmail.com> Closes #15795 from Yunni/SPARK-18081-lsh-guide. (cherry picked from commit 34777184) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
- Dec 02, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? Update ML programming and migration guide for 2.1 release. ## How was this patch tested? Doc change, no test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16076 from yanboliang/spark-18324. (cherry picked from commit 2dc0d7ef) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
- Nov 30, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? API review for 2.1, except ```LSH``` related classes which are still under development. ## How was this patch tested? Only doc changes, no new tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16009 from yanboliang/spark-18318. (cherry picked from commit 60022bfd) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
manishAtGit authored
## What changes were proposed in this pull request? Added missing semicolon in quick-start-guide java example code which wasn't compiling before. ## How was this patch tested? Locally by running and generating site for docs. You can see the last line contains ";" in the below snapshot.  Author: manishAtGit <manish@knoldus.com> Closes #16081 from manishatGit/fixed-quick-start-guide. (cherry picked from commit bc95ea0b) Signed-off-by:
Andrew Or <andrewor14@gmail.com>
-