Commits · 8ef005931a242d087f4879805571be0660aefaf9 · cs525-sp18-g07 / spark

Dec 12, 2016

[DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed · 35011608

Bill Chambers authored 8 years ago

## What changes were proposed in this pull request?

This PR clarifies where accumulators will be displayed.

## How was this patch tested?

No testing.

Please review http://spark.apache.org/contributing.html

 before opening a pull request.

Author: Bill Chambers <bill@databricks.com>
Author: anabranch <wac.chambers@gmail.com>
Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes #16180 from anabranch/improve-acc-docs.

(cherry picked from commit 70ffff21)
Signed-off-by: Sean Owen <sowen@cloudera.com>

35011608

Dec 10, 2016

[MINOR][DOCS] Remove Apache Spark Wiki address · 83822df0

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

According to the notice of the following Wiki front page, we can remove the obsolete wiki pointer safely in `README.md` and `docs/index.md`, too. These two lines are the last occurrence of that links.

```
All current wiki content has been merged into pages at http://spark.apache.org as of November 2016.
Each page links to the new location of its information on the Spark web site.
Obsolete wiki content is still hosted here, but carries a notice that it is no longer current.
```

## How was this patch tested?

Manual.

- `README.md`: https://github.com/dongjoon-hyun/spark/tree/remove_wiki_from_readme
- `docs/index.md`:
```
cd docs
SKIP_API=1 jekyll build
```
![screen shot 2016-12-09 at 2 53 29 pm](https://cloud.githubusercontent.com/assets/9700541/21067323/517252e2-be1f-11e6-85b1-2a4471131c5d.png

)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16239 from dongjoon-hyun/remove_wiki_from_readme.

(cherry picked from commit f3a3fed7)
Signed-off-by: Sean Owen <sowen@cloudera.com>

83822df0

Dec 09, 2016

[SPARK-18812][MLLIB] explain "Spark ML" · e45345d9

Xiangrui Meng authored 8 years ago

## What changes were proposed in this pull request?

There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion.

I check the [Spark FAQ page](http://spark.apache.org/faq.html

), which seems too high-level for the content here. So I added it to the MLlib user guide instead.

cc: mateiz

Author: Xiangrui Meng <meng@databricks.com>

Closes #16241 from mengxr/SPARK-18812.

(cherry picked from commit d2493a20)
Signed-off-by: Xiangrui Meng <meng@databricks.com>

e45345d9

[MINOR][CORE][SQL][DOCS] Typo fixes · b226f10e

Jacek Laskowski authored 8 years ago


## What changes were proposed in this pull request?

Typo fixes

## How was this patch tested?

Local build. Awaiting the official build.

Author: Jacek Laskowski <jacek@japila.pl>

Closes #16144 from jaceklaskowski/typo-fixes.

(cherry picked from commit b162cc0c)
Signed-off-by: Sean Owen <sowen@cloudera.com>

b226f10e

Dec 08, 2016

[SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9095c152

Yanbo Liang authored 8 years ago


## What changes were proposed in this pull request?
* Add all R examples for ML wrappers which were added during 2.1 release cycle.
* Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
* Add corresponding examples to ML user guide.
* Update ML section of SparkR user guide.

Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.

## How was this patch tested?
Run all examples manually.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16148 from yanboliang/spark-18325.

(cherry picked from commit 9bf8f3cd)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

9095c152

Preparing development version 2.1.1-SNAPSHOT · 48aa6775
Patrick Wendell authored 8 years ago

48aa6775
Preparing Spark release v2.1.0-rc2 · 08071749
Patrick Wendell authored 8 years ago

08071749

Dec 07, 2016

[SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and elastic-net · ab865cfd

sethah authored 8 years ago


## What changes were proposed in this pull request?

WeightedLeastSquares now supports L1 and elastic net penalties and has an additional solver option: QuasiNewton. The docs are updated to reflect this change.

## How was this patch tested?

Docs only. Generated documentation to make sure Latex looks ok.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #16139 from sethah/SPARK-18705.

(cherry picked from commit 82253617)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

ab865cfd

[SPARK-18633][ML][EXAMPLE] Add multiclass logistic regression summary python example and document · 839c2eb9

wm624@hotmail.com authored 8 years ago


## What changes were proposed in this pull request?
Logistic Regression summary is added in Python API. We need to add example and document for summary.

The newly added example is consistent with Scala and Java examples.

## How was this patch tested?

Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16064 from wangmiao1981/py.

(cherry picked from commit aad11209)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

839c2eb9

Dec 05, 2016

[DOCS][MINOR] Update location of Spark YARN shuffle jar · 39759ff0

Nicholas Chammas authored 8 years ago


Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`.

This change is so minor I'm not sure it needs a JIRA. But let me know if so and I'll create one.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #16130 from nchammas/yarn-doc-fix.

(cherry picked from commit 5a92dc76)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

39759ff0

[MINOR][DOC] Use SparkR `TRUE` value and add default values for `StructField` in SQL Guide. · afd2321b

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

In `SQL Programming Guide`, this PR uses `TRUE` instead of `True` in SparkR and adds default values of `nullable` for `StructField` in Scala/Python/R (i.e., "Note: The default value of nullable is true."). In Java API, `nullable` is not optional.

**BEFORE**
* SPARK 2.1.0 RC1
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-docs/sql-programming-guide.html#data-types

**AFTER**

* R
<img width="916" alt="screen shot 2016-12-04 at 11 58 19 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877443/abba19a6-ba7d-11e6-8984-afbe00333fb0.png">

* Scala
<img width="914" alt="screen shot 2016-12-04 at 11 57 37 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877433/99ce734a-ba7d-11e6-8bb5-e8619041b09b.png">

* Python
<img width="914" alt="screen shot 2016-12-04 at 11 58 04 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877440/a5c89338-ba7d-11e6-8f92-6c0ae9388d7e.png

">

## How was this patch tested?

Manual.

```
cd docs
SKIP_API=1 jekyll build
open _site/index.html
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16141 from dongjoon-hyun/SPARK-SQL-GUIDE.

(cherry picked from commit 410b7898)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

afd2321b

[SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide. · 1821cbea

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
Add R examples to ML programming guide for the following algorithms as POC:
* spark.glm
* spark.survreg
* spark.naiveBayes
* spark.kmeans

The four algorithms were added to SparkR since 2.0.0, more docs for algorithms added during 2.1 release cycle will be addressed in a separate follow-up PR.

## How was this patch tested?
This is the screenshots of generated ML programming guide for ```GeneralizedLinearRegression```:
![image](https://cloud.githubusercontent.com/assets/1962026/20866403/babad856-b9e1-11e6-9984-62747801e8c4.png

)

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16136 from yanboliang/spark-18279.

(cherry picked from commit eb8dd681)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

1821cbea

Dec 04, 2016

[SPARK-18643][SPARKR] SparkR hangs at session start when installed as a package without Spark · c13c2939

Felix Cheung authored 8 years ago


## What changes were proposed in this pull request?

If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session.

This seems to be a regression on the earlier behavior.

Fix is to always try to install or check for the cached Spark if running in an interactive session.
As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc)

## How was this patch tested?

Manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16077 from felixcheung/rsessioninteractive.

(cherry picked from commit b019b3a8)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

c13c2939

Dec 03, 2016

[SPARK-18081][ML][DOCS] Add user guide for Locality Sensitive Hashing(LSH) · 28f698b4

Yunni authored 8 years ago


## What changes were proposed in this pull request?
The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples.

## How was this patch tested?
Doc has been generated through Jekyll, and checked through manual inspection.

Author: Yunni <Euler57721@gmail.com>
Author: Yun Ni <yunn@uber.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Yun Ni <Euler57721@gmail.com>

Closes #15795 from Yunni/SPARK-18081-lsh-guide.

(cherry picked from commit 34777184)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

28f698b4

Dec 02, 2016

[SPARK-18324][ML][DOC] Update ML programming and migration guide for 2.1 release · 839d4e9c

Yanbo Liang authored 8 years ago


## What changes were proposed in this pull request?
Update ML programming and migration guide for 2.1 release.

## How was this patch tested?
Doc change, no test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16076 from yanboliang/spark-18324.

(cherry picked from commit 2dc0d7ef)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

839d4e9c

Nov 30, 2016

[SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs, docs · f542df31

Yanbo Liang authored 8 years ago


## What changes were proposed in this pull request?
API review for 2.1, except ```LSH``` related classes which are still under development.

## How was this patch tested?
Only doc changes, no new tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16009 from yanboliang/spark-18318.

(cherry picked from commit 60022bfd)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

f542df31

[SPARK][EXAMPLE] Added missing semicolon in quick-start-guide example · eae85da3

manishAtGit authored 8 years ago

## What changes were proposed in this pull request?

Added missing semicolon in quick-start-guide java example code which wasn't compiling before.

## How was this patch tested?
Locally by running and generating site for docs. You can see the last line contains ";" in the below snapshot.
![image](https://cloud.githubusercontent.com/assets/10628224/20751760/9a7e0402-b723-11e6-9aa8-3b6ca2d92ebf.png

)

Author: manishAtGit <manish@knoldus.com>

Closes #16081 from manishatGit/fixed-quick-start-guide.

(cherry picked from commit bc95ea0b)
Signed-off-by: Andrew Or <andrewor14@gmail.com>

eae85da3

Nov 29, 2016

[SPARK-18145] Update documentation for hive partition management in 2.1 · 55b1142b

Eric Liang authored 8 years ago


## What changes were proposed in this pull request?

This documents the partition handling changes for Spark 2.1 and how to migrate existing tables.

## How was this patch tested?

Built docs locally.

rxin

Author: Eric Liang <ekl@databricks.com>

Closes #16074 from ericl/spark-18145.

(cherry picked from commit 489845f3)
Signed-off-by: Reynold Xin <rxin@databricks.com>

55b1142b

[MINOR][DOCS] Updates to the Accumulator example in the programming guide.... · 124944ab

aokolnychyi authored 8 years ago

[MINOR][DOCS] Updates to the Accumulator example in the programming guide. Fixed typos, AccumulatorV2 in Java

## What changes were proposed in this pull request?

This pull request contains updates to Scala and Java Accumulator code snippets in the programming guide.

- For Scala, the pull request fixes the signature of the 'add()' method in the custom Accumulator, which contained two params (as the old AccumulatorParam) instead of one (as in AccumulatorV2).

- The Java example was updated to use the AccumulatorV2 class since AccumulatorParam is marked as deprecated.

- Scala and Java examples are more consistent now.

## How was this patch tested?

This patch was tested manually by building the docs locally.

![image](https://cloud.githubusercontent.com/assets/6235869/20652099/77d98d18-b4f3-11e6-8565-a995fe8cf8e5.png

)

Author: aokolnychyi <okolnychyyanton@gmail.com>

Closes #16024 from aokolnychyi/fixed_accumulator_example.

(cherry picked from commit f045d9da)
Signed-off-by: Sean Owen <sowen@cloudera.com>

124944ab

Nov 28, 2016

[SPARK-18547][CORE] Propagate I/O encryption key when executors register. · c4cbdc86

Marcelo Vanzin authored 8 years ago


This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #15981 from vanzin/SPARK-18547.

(cherry picked from commit 8b325b17)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

c4cbdc86

Preparing development version 2.1.1-SNAPSHOT · 75d73d13
Patrick Wendell authored 8 years ago

75d73d13
Preparing Spark release v2.1.0-rc1 · 80aabc0b
Patrick Wendell authored 8 years ago

80aabc0b

Nov 26, 2016

[WIP][SQL][DOC] Fix incorrect `code` tag · ff699332

Weiqing Yang authored 8 years ago


## What changes were proposed in this pull request?
This PR is to fix incorrect `code` tag in `sql-programming-guide.md`

## How was this patch tested?
Manually.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15941 from weiqingy/fixtag.

(cherry picked from commit f4a98e42)
Signed-off-by: Sean Owen <sowen@cloudera.com>

ff699332

Nov 23, 2016

[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site · 5f198d20

Sean Owen authored 8 years ago


## What changes were proposed in this pull request?

Updates links to the wiki to links to the new location of content on spark.apache.org.

## How was this patch tested?

Doc builds

Author: Sean Owen <sowen@cloudera.com>

Closes #15967 from srowen/SPARK-18073.1.

(cherry picked from commit 7e0cd1d9)
Signed-off-by: Sean Owen <sowen@cloudera.com>

5f198d20

Nov 19, 2016

[SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s · 30a6fbbb

Sean Owen authored 8 years ago


## What changes were proposed in this pull request?

Avoid hard-coding spark.rpc.askTimeout to non-default in Client; fix doc about spark.rpc.askTimeout default

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #15833 from srowen/SPARK-18353.

(cherry picked from commit 8b1e1088)
Signed-off-by: Sean Owen <sowen@cloudera.com>

30a6fbbb

[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note... · 4b396a65

hyukjinkwon authored 8 years ago

[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation

It seems in Scala/Java,

- `Note:`
- `NOTE:`
- `Note that`
- `'''Note:'''`
- `note`

This PR proposes to fix those to `note` to be consistent.

**Before**

- Scala
  ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png)

- Java
  ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png)

**After**

- Scala
  ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png)

- Java
  ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png

)

The notes were found via

```bash
grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:'''
grep -v "// NOTE: " | \  # starting with // does not appear in API documentation.
grep -E '.scala|.java' | \ # java/scala files
grep -v Suite | \ # exclude tests
grep -v Test | \ # exclude tests
grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
-e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...`
-e 'org.apache.spark.api.r' \
...
```

```bash
grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:'''
grep -v "// Note that " | \  # starting with // does not appear in API documentation.
grep -E '.scala|.java' | \ # java/scala files
grep -v Suite | \ # exclude tests
grep -v Test | \ # exclude tests
grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
-e 'org.apache.spark.api.java.function' \
-e 'org.apache.spark.api.r' \
...
```

```bash
grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:'''
grep -v "// Note: " | \  # starting with // does not appear in API documentation.
grep -E '.scala|.java' | \ # java/scala files
grep -v Suite | \ # exclude tests
grep -v Test | \ # exclude tests
grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
-e 'org.apache.spark.api.java.function' \
-e 'org.apache.spark.api.r' \
...
```

```bash
grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:'''
grep -v "// '''Note:''' " | \  # starting with // does not appear in API documentation.
grep -E '.scala|.java' | \ # java/scala files
grep -v Suite | \ # exclude tests
grep -v Test | \ # exclude tests
grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
-e 'org.apache.spark.api.java.function' \
-e 'org.apache.spark.api.r' \
...
```

And then fixed one by one comparing with API documentation/access modifiers.

After that, manually tested via `jekyll build`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #15889 from HyukjinKwon/SPARK-18437.

(cherry picked from commit d5b1d5fc)
Signed-off-by: Sean Owen <sowen@cloudera.com>

4b396a65

Nov 17, 2016

[SPARK-18480][DOCS] Fix wrong links for ML guide docs · 536a2159

Zheng RuiFeng authored 8 years ago


## What changes were proposed in this pull request?
1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the first one had no effert.
2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter`  in `ml-pipeline.md` were linked to `ml-guide.html` by mistake.
3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because class `PythonMLLibAPI` is private.
4, Other link updates.
## How was this patch tested?
 manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15912 from zhengruifeng/md_fix.

(cherry picked from commit cdaf4ce9)
Signed-off-by: Sean Owen <sowen@cloudera.com>

536a2159

[YARN][DOC] Remove non-Yarn specific configurations from running-on-yarn.md · 2ee4fc88

Weiqing Yang authored 8 years ago


## What changes were proposed in this pull request?

Remove `spark.driver.memory`, `spark.executor.memory`,  `spark.driver.cores`, and `spark.executor.cores` from `running-on-yarn.md` as they are not Yarn-specific, and they are also defined in`configuration.md`.

## How was this patch tested?
Build passed & Manually check.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15869 from weiqingy/yarnDoc.

(cherry picked from commit a3cac7bd)
Signed-off-by: Sean Owen <sowen@cloudera.com>

2ee4fc88

Nov 16, 2016

[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · 6a3cbbc0

Holden Karau authored 8 years ago

## What changes were proposed in this pull request?

This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).

Done:
- pip installable on conda [manual tested]
- setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
- Automated testing of this (virtualenv)
- packaging and signing with release-build*

Possible follow up work:
- release-build update to publish to PyPI (SPARK-18128)
- figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
- Windows support and or testing ( SPARK-18136 )
- investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
- consider how we want to number our dev/snapshot versions

Explicitly out of scope:
- Using pip installed PySpark to start a standalone cluster
- Using pip installed PySpark for non-Python Spark programs

*I've done some work to test release-build locally but as a non-committer I've just done local testing.
## How was this patch tested?

Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.

release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)

Author: Holden Karau <holden@us.ibm.com>
Author: Juliet Hougland <juliet@cloudera.com>
Author: Juliet Hougland <not@myemail.com>

Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.

6a3cbbc0

[YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service · 523abfe1

Artur Sukhenko authored 8 years ago

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on.

## How was this patch tested?

#### Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png

)

Author: Artur Sukhenko <artur.sukhenko@gmail.com>

Closes #15906 from Devian-ua/nmHeapSize.

(cherry picked from commit 55589987)
Signed-off-by: Reynold Xin <rxin@databricks.com>

523abfe1

[SPARK-18461][DOCS][STRUCTUREDSTREAMING] Added more information about monitoring streaming queries · 3d4756d5

Tathagata Das authored 8 years ago

## What changes were proposed in this pull request?
<img width="941" alt="screen shot 2016-11-15 at 6 27 32 pm" src="https://cloud.githubusercontent.com/assets/663212/20332521/4190b858-ab61-11e6-93a6-4bdc05105ed9.png">
<img width="940" alt="screen shot 2016-11-15 at 6 27 45 pm" src="https://cloud.githubusercontent.com/assets/663212/20332525/44a0d01e-ab61-11e6-8668-47f925490d4f.png

">

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #15897 from tdas/SPARK-18461.

(cherry picked from commit bb6cdfd9)
Signed-off-by: Michael Armbrust <michael@databricks.com>

3d4756d5

[SPARK-18446][ML][DOCS] Add links to API docs for ML algos · 416bc3dd

Zheng RuiFeng authored 8 years ago


## What changes were proposed in this pull request?
Add links to API docs for ML algos
## How was this patch tested?
Manual checking for the API links

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15890 from zhengruifeng/algo_link.

(cherry picked from commit a75e3fe9)
Signed-off-by: Sean Owen <sowen@cloudera.com>

416bc3dd

[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and... · 82084700

Weiqing Yang authored 8 years ago

[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation

## What changes were proposed in this pull request?

Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation.

## How was this patch tested?
Manually.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15886 from weiqingy/fixTypo.

(cherry picked from commit 241e04bc)
Signed-off-by: Sean Owen <sowen@cloudera.com>

82084700

[DOC][MINOR] Kafka doc: breakup into lines · 4567db9d

Liwei Lin authored 8 years ago

## Before

![before](https://cloud.githubusercontent.com/assets/15843379/20340231/99b039fe-ac1b-11e6-9ba9-b44582427459.png)

## After

![after](https://cloud.githubusercontent.com/assets/15843379/20340236/9d5796e2-ac1b-11e6-92bb-6da40ba1a383.png

)

Author: Liwei Lin <lwlin7@gmail.com>

Closes #15903 from lw-lin/kafka-doc-lines.

(cherry picked from commit 3e01f128)
Signed-off-by: Sean Owen <sowen@cloudera.com>

4567db9d

Nov 15, 2016

[SPARK-18427][DOC] Update docs of mllib.KMeans · 0762c0ce

Zheng RuiFeng authored 8 years ago


## What changes were proposed in this pull request?
1,Remove `runs` from docs of mllib.KMeans
2,Add notes for `k` according to comments in sources
## How was this patch tested?
existing tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15873 from zhengruifeng/update_doc_mllib_kmeans.

(cherry picked from commit 33be4da5)
Signed-off-by: Sean Owen <sowen@cloudera.com>

0762c0ce

Nov 14, 2016

[SPARK-18428][DOC] Update docs for GraphX · 649c15fa

Zheng RuiFeng authored 8 years ago


## What changes were proposed in this pull request?
1, Add link of `VertexRDD` and `EdgeRDD`
2, Notify in `Vertex and Edge RDDs` that not all methods are listed
3, `VertexID` -> `VertexId`

## How was this patch tested?
No tests, only docs is modified

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15875 from zhengruifeng/update_graphop_doc.

(cherry picked from commit c31def1d)
Signed-off-by: Reynold Xin <rxin@databricks.com>

649c15fa

[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB · c07fe1c5

Noritaka Sekiyama authored 8 years ago

Changed HDFS default block size from 64MB to 128MB.
https://issues.apache.org/jira/browse/SPARK-18432



Author: Noritaka Sekiyama <moomindani@gmail.com>

Closes #15879 from moomindani/SPARK-18432.

(cherry picked from commit 9d07ceee)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

c07fe1c5

Nov 13, 2016

[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured... · 0c69224e

Denny Lee authored 8 years ago

[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide

## What changes were proposed in this pull request?

Update the python section of the Structured Streaming Guide from .builder() to .builder

## How was this patch tested?

Validated documentation and successfully running the test example.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

 before opening a pull request.

'Builder' object is not callable object hence changed .builder() to
.builder

Author: Denny Lee <dennylee@gallifrey.local>

Closes #15872 from dennyglee/master.

(cherry picked from commit b91a51bb)
Signed-off-by: Reynold Xin <rxin@databricks.com>

0c69224e

Nov 08, 2016

[SPARK-13770][DOCUMENTATION][ML] Document the ML feature Interaction · ef6b6d3d

chie8842 authored 8 years ago


I created Scala and Java example and added documentation.

Author: chie8842 <hayashidac@nttdata.co.jp>

Closes #15658 from hayashidac/SPARK-13770.

(cherry picked from commit ee2e741a)
Signed-off-by: Sean Owen <sowen@cloudera.com>

ef6b6d3d

Nov 07, 2016

[SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles · c8879bf1

fidato authored 8 years ago


## What changes were proposed in this pull request?

This Pull request comprises of the critical bug SPARK-16575 changes. This change rectifies the issue with BinaryFileRDD partition calculations as  upon creating an RDD with sc.binaryFiles, the resulting RDD always just consisted of two partitions only.
## How was this patch tested?

The original issue ie. getNumPartitions on binary Files RDD (always having two partitions) was first replicated and then tested upon the changes. Also the unit tests have been checked and passed.

This contribution is my original work and I licence the work to the project under the project's open source license

srowen hvanhovell rxin vanzin skyluc kmader zsxwing datafarmer Please have a look .

Author: fidato <fidato.july13@gmail.com>

Closes #15327 from fidato13/SPARK-16575.

(cherry picked from commit 6f369713)
Signed-off-by: Reynold Xin <rxin@databricks.com>

c8879bf1