Commits · 4eace4d384f0e12b4934019d8654b5e3886ddaef · cs525-sp18-g07 / spark

Mar 11, 2016

[SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive. · 07f1c544

Marcelo Vanzin authored 9 years ago

In preparation for the demise of assemblies, this change allows the
YARN backend to use multiple jars and globs as the "Spark jar". The
config option has been renamed to "spark.yarn.jars" to reflect that.

A second option "spark.yarn.archive" was also added; if set, this
takes precedence and uploads an archive expected to contain the jar
files with the Spark code and its dependencies.

Existing deployments should keep working, mostly. This change drops
support for the "SPARK_JAR" environment variable, and also does not
fall back to using "jarOfClass" if no configuration is set, falling
back to finding files under SPARK_HOME instead. This should be fine
since "jarOfClass" probably wouldn't work unless you were using
spark-submit anyway.

Tested with the unit tests, and trying the different config options
on a YARN cluster.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11500 from vanzin/SPARK-13577.

07f1c544

[SPARK-13512][ML] add example and doc for MaxAbsScaler · 0b713e04

Yuhao Yang authored 9 years ago

## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-13512
Add example and doc for ml.feature.MaxAbsScaler.

## How was this patch tested?
 unit tests

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11392 from hhbyyh/maxabsdoc.

0b713e04

[SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIB · d18276cb

Zheng RuiFeng authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-13672

## What changes were proposed in this pull request?

add two python examples of BisectingKMeans for ml and mllib

## How was this patch tested?

manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #11515 from zhengruifeng/mllib_bkm_pe.

d18276cb

Mar 10, 2016

[MINOR][DOC] Fix supported hive version in doc · 88fa8666

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines.
```
-By default Spark will build with Hive 0.13.1 bindings.
+By default Spark will build with Hive 1.2.1 bindings.
-# Apache Hadoop 2.4.X with Hive 13 support
+# Apache Hadoop 2.4.X with Hive 1.2.1 support
```
`sql/README.md` file also describe

## How was this patch tested?

Manual.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11639 from dongjoon-hyun/fix_doc_hive_version.

88fa8666

[SPARK-13706][ML] Add Python Example for Train Validation Split · 3e3c3d58

JeremyNixon authored 9 years ago

## What changes were proposed in this pull request?

This pull request adds a python example for train validation split.

## How was this patch tested?

This was style tested through lint-python, generally tested with ./dev/run-tests, and run in notebook and shell environments. It was viewed in docs locally with jekyll serve.

This contribution is my original work and I license it to Spark under its open source license.

Author: JeremyNixon <jnixon2@gmail.com>

Closes #11547 from JeremyNixon/tvs_example.

3e3c3d58

Mar 09, 2016

[SPARK-13492][MESOS] Configurable Mesos framework webui URL. · a4a0addc

Sergiusz Urbaniak authored 9 years ago

## What changes were proposed in this pull request?

Previously the Mesos framework webui URL was being derived only from the Spark UI address leaving no possibility to configure it. This commit makes it configurable. If unset it falls back to the previous behavior.

Motivation:
This change is necessary in order to be able to install Spark on DCOS and to be able to give it a custom service link. The configured `webui_url` is configured to point to a reverse proxy in the DCOS environment.

## How was this patch tested?

Locally, using unit tests and on DCOS testing and stable revision.

Author: Sergiusz Urbaniak <sur@mesosphere.io>

Closes #11369 from s-urbaniak/sur-webui-url.

a4a0addc

[SPARK-13595][BUILD] Move docker, extras modules into external · 256704c7

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`

## How was this patch tested?

This is tested with Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #11523 from srowen/SPARK-13595.

256704c7

[SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. · c3689bc2

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.

```
-    final ArrayList<Product2<Object, Object>> dataToWrite =
-      new ArrayList<Product2<Object, Object>>();
+    final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
```

Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.

## How was this patch tested?

Manual.
Pass the existing tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11541 from dongjoon-hyun/SPARK-13702.

c3689bc2

Mar 08, 2016

[SPARK-13715][MLLIB] Remove last usages of jblas in tests · 54040f8d

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Remove last usage of jblas, in tests

## How was this patch tested?

Jenkins tests -- the same ones that are being modified.

Author: Sean Owen <sowen@cloudera.com>

Closes #11560 from srowen/SPARK-13715.

54040f8d

Mar 07, 2016

[SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.

I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.

## How was this patch tested?

`./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.

Author: Sean Owen <sowen@cloudera.com>

Closes #11522 from srowen/SPARK-13596.

0eea12a3

[MINOR][DOC] improve the doc for "spark.memory.offHeap.size" · a3ec50a4

CodingCat authored 9 years ago

The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes....

This PR contains a small fix for this tiny issue

document fix

Author: CodingCat <zhunansjtu@gmail.com>

Closes #11561 from CodingCat/master.

a3ec50a4

[SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly... · 4b13896e

rmishra authored 9 years ago

[SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount

## What changes were proposed in this pull request?
The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey.

## How was this patch tested?
Have tested the new documentation with jekyll build.

Author: rmishra <rmishra@pivotal.io>

Closes #11545 from rishitesh/SPARK-13705.

4b13896e

Mar 03, 2016

[SPARK-13013][DOCS] Replace example code in mllib-clustering.md using include_example · 70f6f964

Xin Ren authored 9 years ago

Replace example code in mllib-clustering.md using include_example
https://issues.apache.org/jira/browse/SPARK-13013

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
 in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

Author: Xin Ren <iamshrek@126.com>

Closes #11116 from keypointt/SPARK-13013.

70f6f964

Feb 28, 2016

[SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6

Reynold Xin authored 9 years ago

## What changes were proposed in this pull request?
As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.

## How was this patch tested?
Compilation and existing tests. We should run both SBT and Maven.

Author: Reynold Xin <rxin@databricks.com>

Closes #11409 from rxin/SPARK-13529.

9e01dcc6

Feb 27, 2016

[SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts · 59e3e10b

Reynold Xin authored 9 years ago

## What changes were proposed in this pull request?
We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts.

Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #11400 from rxin/release-script.

59e3e10b

Feb 26, 2016

[SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using include_example · 7af0de07

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR replaces example codes in `mllib-linear-methods.md` using `include_example`
by doing the followings:
  * Extracts the example codes(Scala,Java,Python) as files in `example` module.
  * Merges some dialog-style examples into a single file.
  * Hide redundant codes in HTML for the consistency with other docs.

## How was the this patch tested?

manual test.
This PR can be tested by document generations, `SKIP_API=1 jekyll build`.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11320 from dongjoon-hyun/SPARK-11381.

7af0de07

[SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format · b33261f9

Bryan Cutler authored 9 years ago

Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the tree module.

closes #10601

Author: Bryan Cutler <cutlerb@gmail.com>
Author: vijaykiran <mail@vijaykiran.com>

Closes #11353 from BryanCutler/param-desc-consistent-tree-SPARK-12634.

b33261f9

Feb 25, 2016
- [SPARK-13439][MESOS] Document that spark.mesos.uris is comma-separated · c98a93de
  Michael Gummelt authored 9 years ago
  
  Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11311 from mgummelt/document_csv.
  c98a93de
Feb 23, 2016

[SPARK-10759][ML] update cross validator with include_example · 230bbeaa

JeremyNixon authored 9 years ago

This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide.

Author: JeremyNixon <jnixon2@gmail.com>

Closes #11240 from JeremyNixon/pipeline_include_example.

230bbeaa

[SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab · 9f426339

Lianhui Wang authored 9 years ago

andrewor14 squito Dead Executors should also be displayed on Executor Tab.
as following:
![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png)

Author: Lianhui Wang <lianhuiwang09@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrew@databricks.com>

Closes #10058 from lianhuiwang/SPARK-7729.

9f426339

[SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode · e99d0170
jerryshao authored 9 years ago
```
Author: jerryshao <sshao@hortonworks.com>

Closes #11229 from jerryshao/SPARK-13220.
```
e99d0170

Feb 22, 2016

[SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using include_example · 02b1feff

Devaraj K authored 9 years ago

Replaced example code in ml-guide.md using include_example

Author: Devaraj K <devaraj@apache.org>

Closes #11053 from devaraj-kavali/SPARK-13012.

02b1feff

[SPARK-13016][DOCUMENTATION] Replace example code in... · 9f410871

Devaraj K authored 9 years ago

[SPARK-13016][DOCUMENTATION] Replace example code in mllib-dimensionality-reduction.md using include_example

Replaced example example code in mllib-dimensionality-reduction.md using
include_example

Author: Devaraj K <devaraj@apache.org>

Closes #11132 from devaraj-kavali/SPARK-13016.

9f410871

[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format · e298ac91

Bryan Cutler authored 9 years ago

Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the fpm and recommendation modules.

Closes #10602
Closes #10897

Author: Bryan Cutler <cutlerb@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #11186 from BryanCutler/param-desc-consistent-fpmrecc-SPARK-12632.

e298ac91

[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments · 024482bf

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.

024482bf

Feb 21, 2016

[MINOR][DOCS] Fix typos in `configuration.md` and `hardware-provisioning.md` · 03e62aa3

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR fixes some typos in the following documentation files.
 * `NOTICE`, `configuration.md`, and `hardware-provisioning.md`.

## How was the this patch tested?

manual tests

Author: Dongjoon Hyun <dongjoonapache.org>

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11289 from dongjoon-hyun/minor_fix_typos_notice_and_confdoc.

03e62aa3

Feb 19, 2016

[MINOR][DOCS][MESOS] Clarify that Mesos version is a lower bound. · 6915cc23

Iulian Dragos authored 9 years ago

## What changes were proposed in this pull request?

Clarify that 0.21 is only a **minimum** requirement.

## How was the this patch tested?

It's a doc change, so no tests.

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #11271 from dragos/patch-1.

6915cc23

[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold · fb7e2179

Sean Owen authored 9 years ago

Clarify that reduce functions need to be commutative, and fold functions do not

See https://github.com/apache/spark/pull/11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.

fb7e2179

Feb 17, 2016
- [SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x · b8440486
  Sean Owen authored 9 years ago
  
  Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.
  b8440486
- [SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's default is "python2.7" · a7c74d75
  Christopher C. Aycock authored 9 years ago
  
  Author: Christopher C. Aycock <chris@chrisaycock.com> Closes #11239 from chrisaycock/master.
  a7c74d75
Feb 16, 2016

[SPARK-11627] Add initial input rate limit for spark streaming backpressure mechanism. · 7218c0eb

junhao authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-11627

Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception.
In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively.

Author: junhao <junhao@mogujie.com>

Closes #9593 from junhaoMg/junhao-dev.

7218c0eb

[SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general · 00c72d27

BenFradet authored 9 years ago

This documents the implementation of ALS in `spark.ml` with example code in scala, java and python.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10411 from BenFradet/SPARK-12247.

00c72d27

Feb 15, 2016

[SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using include_example · e4675c24

Xin Ren authored 9 years ago

Replace example code in mllib-pmml-model-export.md using include_example
https://issues.apache.org/jira/browse/SPARK-13018

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
 in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

Author: Xin Ren <iamshrek@126.com>

Closes #11126 from keypointt/SPARK-13018.

e4675c24

[SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide · adb54836

JeremyNixon authored 9 years ago

Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312.

This contribution is my original work and I license the work to this project.

Author: JeremyNixon <jnixon2@gmail.com>

Closes #11199 from JeremyNixon/update_train_val_split_example.

adb54836

Feb 14, 2016

[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy · 331293c3

Amit Dev authored 9 years ago

Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps.

Author: Amit Dev <amitdev@gmail.com>

Closes #11180 from amitdev/master.

331293c3

Feb 12, 2016

[SPARK-6166] Limit number of in flight outbound requests · 894921d8

Sanket authored 9 years ago

This JIRA is related to
https://github.com/apache/spark/pull/5852
Had to do some minor rework and test to make sure it
works with current version of spark.

Author: Sanket <schintap@untilservice-lm>

Closes #10838 from redsanket/limit-outbound-connections.

894921d8

Feb 11, 2016

[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps · a2c7dcf6

Steve Loughran authored 9 years ago

When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger *filesize*. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI.

https://issues.apache.org/jira/browse/SPARK-7889

Author: Steve Loughran <stevel@hortonworks.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #11118 from squito/SPARK-7889-alternate.

a2c7dcf6

[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template · c2f21d88

Sasaki Toru authored 9 years ago

In spark-env.sh.template, there are multi-byte characters, this PR will remove it.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.

c2f21d88

Feb 10, 2016

[SPARK-12414][CORE] Remove closure serializer · 29c54730

Sean Owen authored 9 years ago

Remove spark.closure.serializer option and use JavaSerializer always

CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be.

Author: Sean Owen <sowen@cloudera.com>

Closes #11150 from srowen/SPARK-12414.

29c54730

[SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode. · 80cb963a

Michael Gummelt authored 9 years ago

This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027

In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone.  This PR implements that resolution.

This PR implements two high-level features.  These two features are co-dependent, so they're implemented both here:
- Mesos support for spark.executor.cores
- Multiple executors per slave

We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #10993 from mgummelt/executor_sizing.

80cb963a