Commits · 7c23bd49e826fc2b7f132ffac2e55a71905abe96 · cs525-sp18-g07 / spark

Feb 01, 2017

[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning · 61cdc8c7

Zheng RuiFeng authored 8 years ago


## What changes were proposed in this pull request?
Fix brokens links in ml-pipeline and ml-tuning
`<div data-lang="scala">`  ->   `<div data-lang="scala" markdown="1">`

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #16754 from zhengruifeng/doc_api_fix.

(cherry picked from commit 04ee8cf6)
Signed-off-by: Sean Owen <sowen@cloudera.com>

61cdc8c7

Jan 30, 2017

[SPARK-19396][DOC] JDBC Options are Case In-sensitive · 445438c9

gatorsmile authored 8 years ago

### What changes were proposed in this pull request?
The case are not sensitive in JDBC options, after the PR https://github.com/apache/spark/pull/15884

 is merged to Spark 2.1.

### How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #16734 from gatorsmile/fixDocCaseInsensitive.

(cherry picked from commit c0eda7e8)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>

445438c9

Jan 25, 2017

[SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide · e2f77392

aokolnychyi authored 8 years ago

## What changes were proposed in this pull request?

- A separate subsection for Aggregations under “Getting Started” in the Spark SQL programming guide. It mentions which aggregate functions are predefined and how users can create their own.
- Examples of using the `UserDefinedAggregateFunction` abstract class for untyped aggregations in Java and Scala.
- Examples of using the `Aggregator` abstract class for type-safe aggregations in Java and Scala.
- Python is not covered.
- The PR might not resolve the ticket since I do not know what exactly was planned by the author.

In total, there are four new standalone examples that can be executed via `spark-submit` or `run-example`. The updated Spark SQL programming guide references to these examples and does not contain hard-coded snippets.

## How was this patch tested?

The patch was tested locally by building the docs. The examples were run as well.

![image](https://cloud.githubusercontent.com/assets/6235869/21292915/04d9d084-c515-11e6-811a-999d598dffba.png

)

Author: aokolnychyi <okolnychyyanton@gmail.com>

Closes #16329 from aokolnychyi/SPARK-16046.

(cherry picked from commit 3fdce814)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>

e2f77392

Jan 10, 2017

[SPARK-19140][SS] Allow update mode for non-aggregation streaming queries · 230607d6

Shixiong Zhu authored 8 years ago


## What changes were proposed in this pull request?

This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16520 from zsxwing/update-without-agg.

(cherry picked from commit bc6c56e9)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

230607d6

Jan 07, 2017

[SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` with `LOCATION` · ecc16220

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

This PR adds a new behavior change description on `CREATE TABLE ... LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276).

## How was this patch tested?

```
SKIP_API=1 jekyll build
```

**Newly Added Description**
<img width="913" alt="new" src="https://cloud.githubusercontent.com/assets/9700541/21743606/7efe2b12-d4ba-11e6-8a0d-551222718ea2.png

">

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16400 from dongjoon-hyun/SPARK-18941.

(cherry picked from commit 923e5948)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>

ecc16220

[SPARK-19106][DOCS] Styling for the configuration docs is broken · c95b5855

Sean Owen authored 8 years ago


configuration.html section headings were not specified correctly in markdown and weren't rendering, being recognized correctly. Removed extra p tags and pulled level 4 titles up to level 3, since level 3 had been skipped. This improves the TOC.

Doc build, manual check.

Author: Sean Owen <sowen@cloudera.com>

Closes #16490 from srowen/SPARK-19106.

(cherry picked from commit 54138f6e)
Signed-off-by: Sean Owen <sowen@cloudera.com>

c95b5855

Jan 06, 2017

[SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for... · ee735a8a

Tathagata Das authored 8 years ago

[SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options

## What changes were proposed in this pull request?

Updates
- Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
- Updated Output Modes section with Update mode
- Added options for all the sources and sinks

---------------------------
---------------------------

![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)

---------------------------
---------------------------
<img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png">
<img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png">

---------------------------
---------------------------
![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png

)

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #16468 from tdas/SPARK-19074.

(cherry picked from commit b59cddab)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

ee735a8a

[SPARK-19033][CORE] Add admin acls for history server · 4ca17888

jerryshao authored 8 years ago


## What changes were proposed in this pull request?

Current HistoryServer's ACLs is derived from application event-log, which means the newly changed ACLs cannot be applied to the old data, this will become a problem where newly added admin cannot access the old application history UI, only the new application can be affected.

So here propose to add admin ACLs for history server, any configured user/group could have the view access to all the applications, while the view ACLs derived from application run-time still take effect.

## How was this patch tested?

Unit test added.

Author: jerryshao <sshao@hortonworks.com>

Closes #16470 from jerryshao/SPARK-19033.

(cherry picked from commit 4a4c3dc9)
Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>

4ca17888

Jan 02, 2017

[SPARK-19041][SS] Fix code snippet compilation issues in Structured Streaming Programming Guide · d489e1dc

Liwei Lin authored 8 years ago

## What changes were proposed in this pull request?

Currently some code snippets in the programming guide just do not compile. We should fix them.

## How was this patch tested?

```
SKIP_API=1 jekyll build
```

## Screenshot from part of the change:

![snip20161231_37](https://cloud.githubusercontent.com/assets/15843379/21576864/cc52fcd8-cf7b-11e6-8bd6-f935d9ff4a6b.png)

Author: Liwei Lin <lwlin7@gmail.com>

Closes #16442 from lw-lin/ss-pro-guide-.

d489e1dc

[MINOR][DOC] Minor doc change for YARN credential providers · 63857c8d

Liang-Chi Hsieh authored 8 years ago

## What changes were proposed in this pull request?

The configuration `spark.yarn.security.tokens.{service}.enabled` is deprecated. Now we should use `spark.yarn.security.credentials.{service}.enabled`. Some places in the doc is not updated yet.

## How was this patch tested?

N/A. Just doc change.

Please review http://spark.apache.org/contributing.html

 before opening a pull request.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #16444 from viirya/minor-credential-provider-doc.

(cherry picked from commit 0ac2f1e7)
Signed-off-by: Sean Owen <sowen@cloudera.com>

63857c8d

Dec 30, 2016

[SPARK-19016][SQL][DOC] Document scalable partition handling · 20ae1172

Cheng Lian authored 8 years ago


This PR documents the scalable partition handling feature in the body of the programming guide.

Before this PR, we only mention it in the migration guide. It's not super clear that external datasource tables require an extra `MSCK REPAIR TABLE` command is to have per-partition information persisted since 2.1.

N/A.

Author: Cheng Lian <lian@databricks.com>

Closes #16424 from liancheng/scalable-partition-handling-doc.

(cherry picked from commit 871f6114)
Signed-off-by: Cheng Lian <lian@databricks.com>

20ae1172

Dec 29, 2016

[SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design... · 47ab4afe

adesharatushar authored 8 years ago

[SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design Patterns for using foreachRDD

## What changes were proposed in this pull request?

Added missing Java example under section "Design Patterns for using foreachRDD". Now this section has examples in all 3 languages, improving consistency of documentation.

## How was this patch tested?

Manual.
Generated docs using command "SKIP_API=1 jekyll build" and verified generated HTML page manually.

The syntax of example has been tested for correctness using sample code on Java1.7 and Spark 2.2.0-SNAPSHOT.

Author: adesharatushar <tushar_adeshara@persistent.com>

Closes #16408 from adesharatushar/streaming-doc-fix.

(cherry picked from commit dba81e1d)
Signed-off-by: Sean Owen <sowen@cloudera.com>

47ab4afe

Dec 28, 2016

[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding... · 80d583bd

Tathagata Das authored 8 years ago

[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status

## What changes were proposed in this pull request?

- Extended the Window operation section with code snippet and explanation of watermarking
- Extended the Output Mode section with a table showing the compatibility between query type and output mode
- Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status
- Updated API changes in the StreamingQueryListener example

TODO
- [x] Figure showing the watermarking

## How was this patch tested?

N/A

## Screenshots
### Section: Windowed Aggregation with Event Time

<img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">

![image](https://cloud.githubusercontent.com/assets/663212/21246241/45b0f87a-c2dc-11e6-9c29-d0a89e07bf8d.png)

<img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png">

----------------------------
### Section: Output Modes
![image](https://cloud.githubusercontent.com/assets/663212/21246276/8ee44948-c2dc-11e6-9fa2-30502fcf9a55.png)

----------------------------
### Section: Monitoring
![image](https://cloud.githubusercontent.com/assets/663212/21246535/3c5baeb2-c2de-11e6-88cd-ca71db7c5cf9.png)
![image](https://cloud.githubusercontent.com/assets/663212/21246574/789492c2-c2de-11e6-8471-7bef884e1837.png

)

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #16294 from tdas/SPARK-18669.

(cherry picked from commit 092c6725)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

80d583bd

Dec 20, 2016

[SPARK-18761][CORE] Introduce "task reaper" to oversee task killing in executors · 2971ae56

Josh Rosen authored 8 years ago

## What changes were proposed in this pull request?

Spark's current task cancellation / task killing mechanism is "best effort" because some tasks may not be interruptible or may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources that are being used by these zombie tasks.

This patch aims to address this problem by adding a "task reaper" mechanism to executors. At a high-level, task killing now launches a new thread which attempts to kill the task and then watches the task and periodically checks whether it has been killed. The TaskReaper will periodically re-attempt to call `TaskRunner.kill()` and will log warnings if the task keeps running. I modified TaskRunner to rename its thread at the start of the task, allowing TaskReaper to take a thread dump and filter it in order to log stacktraces from the exact task thread that we are waiting to finish. If the task has not stopped after a configurable timeout then the TaskReaper will throw an exception to trigger executor JVM death, thereby forcibly freeing any resources consumed by the zombie tasks.

This feature is flagged off by default and is controlled by four new configurations under the `spark.task.reaper.*` namespace. See the updated `configuration.md` doc for details.

## How was this patch tested?

Tested via a new test case in `JobCancellationSuite`, plus manual testing.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #16189 from JoshRosen/cancellation.

2971ae56

Dec 18, 2016

[SPARK-18918][DOC] Missing </td> in Configuration page · 4b8a643f

gatorsmile authored 8 years ago

### What changes were proposed in this pull request?
The configuration page looks messy now, as shown in the nightly build:
https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/configuration.html

Starting from the following location:

![screenshot 2016-12-18 00 26 33](https://cloud.githubusercontent.com/assets/11567269/21292396/ace4719c-c4b8-11e6-8dfd-d9ab95be43d5.png)

### How was this patch tested?
Attached is the screenshot generated in my local computer after the fix.
[Configuration - Spark 2.2.0 Documentation.pdf](https://github.com/apache/spark/files/659315/Configuration.-.Spark.2.2.0.Documentation.pdf

)

Author: gatorsmile <gatorsmile@gmail.com>

Closes #16327 from gatorsmile/docFix.

(cherry picked from commit c0c9e1d2)
Signed-off-by: Sean Owen <sowen@cloudera.com>

4b8a643f

Dec 17, 2016

[SPARK-18849][ML][SPARKR][DOC] vignettes final check reorg · 001f49b7

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Reorganizing content (copy/paste)

## How was this patch tested?

https://felixcheung.github.io/sparkr-vignettes.html

Previous:
https://felixcheung.github.io/sparkr-vignettes_old.html



Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16301 from felixcheung/rvignettespass2.

(cherry picked from commit 38fd163d)
Signed-off-by: Felix Cheung <felixcheung@apache.org>

001f49b7

Dec 15, 2016
- Preparing development version 2.1.1-SNAPSHOT · 483624c2
  Patrick Wendell authored 8 years ago
  
  483624c2
- Preparing Spark release v2.1.0-rc5 · cd0a0836
  Patrick Wendell authored 8 years ago
  
  View commits for tag v2.1.0 v2.1.0
  
  cd0a0836
- Preparing development version 2.1.1-SNAPSHOT · 62a6577b
  Patrick Wendell authored 8 years ago
  
  62a6577b
- Preparing Spark release v2.1.0-rc4 · ec317265
  Patrick Wendell authored 8 years ago
  
  ec317265
- Preparing development version 2.1.1-SNAPSHOT · a7364a82
  Patrick Wendell authored 8 years ago
  
  a7364a82
- Preparing Spark release v2.1.0-rc3 · ef2ccf94
  Patrick Wendell authored 8 years ago
  
  ef2ccf94
Dec 14, 2016

[SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding `DESCRIPTION` file · d399a297

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

Since Apache Spark 1.4.0, R API document page has a broken link on `DESCRIPTION file` because Jekyll plugin script doesn't copy the file. This PR aims to fix that.

- Official Latest Website: http://spark.apache.org/docs/latest/api/R/index.html
- Apache Spark 2.1.0-rc2: http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/api/R/index.html



## How was this patch tested?

Manual.

```bash
cd docs
SKIP_SCALADOC=1 jekyll build
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16292 from dongjoon-hyun/SPARK-18875.

(cherry picked from commit ec0eae48)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

d399a297

Dec 12, 2016

[DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed · 35011608

Bill Chambers authored 8 years ago

## What changes were proposed in this pull request?

This PR clarifies where accumulators will be displayed.

## How was this patch tested?

No testing.

Please review http://spark.apache.org/contributing.html

 before opening a pull request.

Author: Bill Chambers <bill@databricks.com>
Author: anabranch <wac.chambers@gmail.com>
Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes #16180 from anabranch/improve-acc-docs.

(cherry picked from commit 70ffff21)
Signed-off-by: Sean Owen <sowen@cloudera.com>

35011608

Dec 10, 2016

[MINOR][DOCS] Remove Apache Spark Wiki address · 83822df0

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

According to the notice of the following Wiki front page, we can remove the obsolete wiki pointer safely in `README.md` and `docs/index.md`, too. These two lines are the last occurrence of that links.

```
All current wiki content has been merged into pages at http://spark.apache.org as of November 2016.
Each page links to the new location of its information on the Spark web site.
Obsolete wiki content is still hosted here, but carries a notice that it is no longer current.
```

## How was this patch tested?

Manual.

- `README.md`: https://github.com/dongjoon-hyun/spark/tree/remove_wiki_from_readme
- `docs/index.md`:
```
cd docs
SKIP_API=1 jekyll build
```
![screen shot 2016-12-09 at 2 53 29 pm](https://cloud.githubusercontent.com/assets/9700541/21067323/517252e2-be1f-11e6-85b1-2a4471131c5d.png

)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16239 from dongjoon-hyun/remove_wiki_from_readme.

(cherry picked from commit f3a3fed7)
Signed-off-by: Sean Owen <sowen@cloudera.com>

83822df0

Dec 09, 2016

[SPARK-18812][MLLIB] explain "Spark ML" · e45345d9

Xiangrui Meng authored 8 years ago

## What changes were proposed in this pull request?

There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion.

I check the [Spark FAQ page](http://spark.apache.org/faq.html

), which seems too high-level for the content here. So I added it to the MLlib user guide instead.

cc: mateiz

Author: Xiangrui Meng <meng@databricks.com>

Closes #16241 from mengxr/SPARK-18812.

(cherry picked from commit d2493a20)
Signed-off-by: Xiangrui Meng <meng@databricks.com>

e45345d9

[MINOR][CORE][SQL][DOCS] Typo fixes · b226f10e

Jacek Laskowski authored 8 years ago


## What changes were proposed in this pull request?

Typo fixes

## How was this patch tested?

Local build. Awaiting the official build.

Author: Jacek Laskowski <jacek@japila.pl>

Closes #16144 from jaceklaskowski/typo-fixes.

(cherry picked from commit b162cc0c)
Signed-off-by: Sean Owen <sowen@cloudera.com>

b226f10e

Dec 08, 2016

[SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9095c152

Yanbo Liang authored 8 years ago


## What changes were proposed in this pull request?
* Add all R examples for ML wrappers which were added during 2.1 release cycle.
* Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
* Add corresponding examples to ML user guide.
* Update ML section of SparkR user guide.

Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.

## How was this patch tested?
Run all examples manually.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16148 from yanboliang/spark-18325.

(cherry picked from commit 9bf8f3cd)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

9095c152

Preparing development version 2.1.1-SNAPSHOT · 48aa6775
Patrick Wendell authored 8 years ago

48aa6775
Preparing Spark release v2.1.0-rc2 · 08071749
Patrick Wendell authored 8 years ago

08071749

Dec 07, 2016

[SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and elastic-net · ab865cfd

sethah authored 8 years ago


## What changes were proposed in this pull request?

WeightedLeastSquares now supports L1 and elastic net penalties and has an additional solver option: QuasiNewton. The docs are updated to reflect this change.

## How was this patch tested?

Docs only. Generated documentation to make sure Latex looks ok.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #16139 from sethah/SPARK-18705.

(cherry picked from commit 82253617)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

ab865cfd

[SPARK-18633][ML][EXAMPLE] Add multiclass logistic regression summary python example and document · 839c2eb9

wm624@hotmail.com authored 8 years ago


## What changes were proposed in this pull request?
Logistic Regression summary is added in Python API. We need to add example and document for summary.

The newly added example is consistent with Scala and Java examples.

## How was this patch tested?

Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16064 from wangmiao1981/py.

(cherry picked from commit aad11209)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

839c2eb9

Dec 05, 2016

[DOCS][MINOR] Update location of Spark YARN shuffle jar · 39759ff0

Nicholas Chammas authored 8 years ago


Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`.

This change is so minor I'm not sure it needs a JIRA. But let me know if so and I'll create one.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #16130 from nchammas/yarn-doc-fix.

(cherry picked from commit 5a92dc76)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

39759ff0

[MINOR][DOC] Use SparkR `TRUE` value and add default values for `StructField` in SQL Guide. · afd2321b

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

In `SQL Programming Guide`, this PR uses `TRUE` instead of `True` in SparkR and adds default values of `nullable` for `StructField` in Scala/Python/R (i.e., "Note: The default value of nullable is true."). In Java API, `nullable` is not optional.

**BEFORE**
* SPARK 2.1.0 RC1
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-docs/sql-programming-guide.html#data-types

**AFTER**

* R
<img width="916" alt="screen shot 2016-12-04 at 11 58 19 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877443/abba19a6-ba7d-11e6-8984-afbe00333fb0.png">

* Scala
<img width="914" alt="screen shot 2016-12-04 at 11 57 37 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877433/99ce734a-ba7d-11e6-8bb5-e8619041b09b.png">

* Python
<img width="914" alt="screen shot 2016-12-04 at 11 58 04 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877440/a5c89338-ba7d-11e6-8f92-6c0ae9388d7e.png

">

## How was this patch tested?

Manual.

```
cd docs
SKIP_API=1 jekyll build
open _site/index.html
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16141 from dongjoon-hyun/SPARK-SQL-GUIDE.

(cherry picked from commit 410b7898)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

afd2321b

[SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide. · 1821cbea

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
Add R examples to ML programming guide for the following algorithms as POC:
* spark.glm
* spark.survreg
* spark.naiveBayes
* spark.kmeans

The four algorithms were added to SparkR since 2.0.0, more docs for algorithms added during 2.1 release cycle will be addressed in a separate follow-up PR.

## How was this patch tested?
This is the screenshots of generated ML programming guide for ```GeneralizedLinearRegression```:
![image](https://cloud.githubusercontent.com/assets/1962026/20866403/babad856-b9e1-11e6-9984-62747801e8c4.png

)

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16136 from yanboliang/spark-18279.

(cherry picked from commit eb8dd681)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

1821cbea

Dec 04, 2016

[SPARK-18643][SPARKR] SparkR hangs at session start when installed as a package without Spark · c13c2939

Felix Cheung authored 8 years ago


## What changes were proposed in this pull request?

If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session.

This seems to be a regression on the earlier behavior.

Fix is to always try to install or check for the cached Spark if running in an interactive session.
As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc)

## How was this patch tested?

Manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16077 from felixcheung/rsessioninteractive.

(cherry picked from commit b019b3a8)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

c13c2939

Dec 03, 2016

[SPARK-18081][ML][DOCS] Add user guide for Locality Sensitive Hashing(LSH) · 28f698b4

Yunni authored 8 years ago


## What changes were proposed in this pull request?
The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples.

## How was this patch tested?
Doc has been generated through Jekyll, and checked through manual inspection.

Author: Yunni <Euler57721@gmail.com>
Author: Yun Ni <yunn@uber.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Yun Ni <Euler57721@gmail.com>

Closes #15795 from Yunni/SPARK-18081-lsh-guide.

(cherry picked from commit 34777184)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

28f698b4

Dec 02, 2016

[SPARK-18324][ML][DOC] Update ML programming and migration guide for 2.1 release · 839d4e9c

Yanbo Liang authored 8 years ago


## What changes were proposed in this pull request?
Update ML programming and migration guide for 2.1 release.

## How was this patch tested?
Doc change, no test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16076 from yanboliang/spark-18324.

(cherry picked from commit 2dc0d7ef)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

839d4e9c

Nov 30, 2016

[SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs, docs · f542df31

Yanbo Liang authored 8 years ago


## What changes were proposed in this pull request?
API review for 2.1, except ```LSH``` related classes which are still under development.

## How was this patch tested?
Only doc changes, no new tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16009 from yanboliang/spark-18318.

(cherry picked from commit 60022bfd)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

f542df31

[SPARK][EXAMPLE] Added missing semicolon in quick-start-guide example · eae85da3

manishAtGit authored 8 years ago

## What changes were proposed in this pull request?

Added missing semicolon in quick-start-guide java example code which wasn't compiling before.

## How was this patch tested?
Locally by running and generating site for docs. You can see the last line contains ";" in the below snapshot.
![image](https://cloud.githubusercontent.com/assets/10628224/20751760/9a7e0402-b723-11e6-9aa8-3b6ca2d92ebf.png

)

Author: manishAtGit <manish@knoldus.com>

Closes #16081 from manishatGit/fixed-quick-start-guide.

(cherry picked from commit bc95ea0b)
Signed-off-by: Andrew Or <andrewor14@gmail.com>

eae85da3