Commits · 9f5647d62ee569a4c8bdc242adcb8d4e05c662f9 · cs525-sp18-g07 / spark

Jun 15, 2017

[SPARK-20434][YARN][CORE] Move Hadoop delegation token code from yarn to core · a18d6371

Michael Gummelt authored 7 years ago

## What changes were proposed in this pull request?

Move Hadoop delegation token code from `spark-yarn` to `spark-core`, so that other schedulers (such as Mesos), may use it.  In order to avoid exposing Hadoop interfaces in spark-core, the new Hadoop delegation token classes are kept private.  In order to provider backward compatiblity, and to allow YARN users to continue to load their own delegation token providers via Java service loading, the old YARN interfaces, as well as the client code that uses them, have been retained.

Summary:
- Move registered `yarn.security.ServiceCredentialProvider` classes from `spark-yarn` to `spark-core`.  Moved them into a new, private hierarchy under `HadoopDelegationTokenProvider`.  Client code in `HadoopDelegationTokenManager` now loads credentials from a whitelist of three providers (`HadoopFSDelegationTokenProvider`, `HiveDelegationTokenProvider`, `HBaseDelegationTokenProvider`), instead of service loading, which means that users are not able to implement their own delegation token providers, as they are in the `spark-yarn` module.

- The `yarn.security.ServiceCredentialProvider` interface has been kept for backwards compatibility, and to continue to allow YARN users to implement their own delegation token provider implementations.  Client code in YARN now fetches tokens via the new `YARNHadoopDelegationTokenManager` class, which fetches tokens from the core providers through `HadoopDelegationTokenManager`, as well as service loads them from `yarn.security.ServiceCredentialProvider`.

Old Hierarchy:

```
yarn.security.ServiceCredentialProvider (service loaded)
  HadoopFSCredentialProvider
  HiveCredentialProvider
  HBaseCredentialProvider
yarn.security.ConfigurableCredentialManager
```

New Hierarchy:

```
HadoopDelegationTokenManager
HadoopDelegationTokenProvider (not service loaded)
  HadoopFSDelegationTokenProvider
  HiveDelegationTokenProvider
  HBaseDelegationTokenProvider

yarn.security.ServiceCredentialProvider (service loaded)
yarn.security.YARNHadoopDelegationTokenManager
```
## How was this patch tested?

unit tests

Author: Michael Gummelt <mgummelt@mesosphere.io>
Author: Dr. Stefan Schimanski <sttts@mesosphere.io>

Closes #17723 from mgummelt/SPARK-20434-refactor-kerberos.

a18d6371

May 08, 2017

[SPARK-20605][CORE][YARN][MESOS] Deprecate not used AM and executor port configuration · 829cd7b8

jerryshao authored 7 years ago

## What changes were proposed in this pull request?

After SPARK-10997, client mode Netty RpcEnv doesn't require to start server, so port configurations are not used any more, here propose to remove these two configurations: "spark.executor.port" and "spark.am.port".

## How was this patch tested?

Existing UTs.

Author: jerryshao <sshao@hortonworks.com>

Closes #17866 from jerryshao/SPARK-20605.

829cd7b8

Feb 22, 2017

[SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM. · 4661d30b

Marcelo Vanzin authored 8 years ago

Allow an application to use the History Server URL as the tracking
URL in the YARN RM, so there's still a link to the web UI somewhere
in YARN even if the driver's UI is disabled. This is useful, for
example, if an admin wants to disable the driver UI by default for
applications, since it's harder to secure it (since it involves non
trivial ssl certificate and auth management that admins may not want
to expose to user apps).

This needs to be opt-in, because of the way the YARN proxy works, so
a new configuration was added to enable the option.

The YARN RM will proxy requests to live AMs instead of redirecting
the client, so pages in the SHS UI will not render correctly since
they'll reference invalid paths in the RM UI. The proxy base support
in the SHS cannot be used since that would prevent direct access to
the SHS.

So, to solve this problem, for the feature to work end-to-end, a new
YARN-specific filter was added that detects whether the requests come
from the proxy and redirects the client appropriatly. The SHS admin has
to add this filter manually if they want the feature to work.

Tested with new unit test, and by running with the documented configuration
set in a test cluster. Also verified the driver UI is used when it's
enabled.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16946 from vanzin/SPARK-19554.

4661d30b

Feb 10, 2017

[SPARK-19545][YARN] Fix compile issue for Spark on Yarn when building against Hadoop 2.6.0~2.6.3 · 8e8afb3a

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

Due to the newly added API in Hadoop 2.6.4+, Spark builds against Hadoop 2.6.0~2.6.3 will meet compile error. So here still reverting back to use reflection to handle this issue.

## How was this patch tested?

Manual verification.

Author: jerryshao <sshao@hortonworks.com>

Closes #16884 from jerryshao/SPARK-19545.

8e8afb3a

Feb 08, 2017

[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier · e8d3fca4

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16810 from srowen/SPARK-19464.

e8d3fca4

Jan 17, 2017

[SPARK-19179][YARN] Change spark.yarn.access.namenodes config and update docs · b79cc7ce

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

`spark.yarn.access.namenodes` configuration cannot actually reflects the usage of it, inside the code it is the Hadoop filesystems we get tokens, not NNs. So here propose to update the name of this configuration, also change the related code and doc.

## How was this patch tested?

Local verification.

Author: jerryshao <sshao@hortonworks.com>

Closes #16560 from jerryshao/SPARK-19179.

b79cc7ce

Jan 11, 2017

[SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems · 4239a108

jerryshao authored 8 years ago

Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier.

## How was this patch tested?

Manually verified in security cluster.

Author: jerryshao <sshao@hortonworks.com>

Closes #16432 from jerryshao/SPARK-19021.

4239a108

Jan 02, 2017

[MINOR][DOC] Minor doc change for YARN credential providers · 0ac2f1e7

Liang-Chi Hsieh authored 8 years ago

## What changes were proposed in this pull request?

The configuration `spark.yarn.security.tokens.{service}.enabled` is deprecated. Now we should use `spark.yarn.security.credentials.{service}.enabled`. Some places in the doc is not updated yet.

## How was this patch tested?

N/A. Just doc change.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #16444 from viirya/minor-credential-provider-doc.

0ac2f1e7

Dec 05, 2016

[DOCS][MINOR] Update location of Spark YARN shuffle jar · 5a92dc76

Nicholas Chammas authored 8 years ago

Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`.

This change is so minor I'm not sure it needs a JIRA. But let me know if so and I'll create one.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #16130 from nchammas/yarn-doc-fix.

5a92dc76

Nov 17, 2016

[YARN][DOC] Remove non-Yarn specific configurations from running-on-yarn.md · a3cac7bd

Weiqing Yang authored 8 years ago

## What changes were proposed in this pull request?

Remove `spark.driver.memory`, `spark.executor.memory`,  `spark.driver.cores`, and `spark.executor.cores` from `running-on-yarn.md` as they are not Yarn-specific, and they are also defined in`configuration.md`.

## How was this patch tested?
Build passed & Manually check.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15869 from weiqingy/yarnDoc.

a3cac7bd

Nov 16, 2016

[YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service · 55589987

Artur Sukhenko authored 8 years ago

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on.

## How was this patch tested?

#### Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png)

Author: Artur Sukhenko <artur.sukhenko@gmail.com>

Closes #15906 from Devian-ua/nmHeapSize.

55589987

Aug 10, 2016

[SPARK-14743][YARN] Add a configurable credential manager for Spark running on YARN · ab648c00

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

Add a configurable token manager for Spark on running on yarn.

### Current Problems ###

1. Supported token provider is hard-coded, currently only hdfs, hbase and hive are supported and it is impossible for user to add new token provider without code changes.
2. Also this problem exits in timely token renewer and updater.

### Changes In This Proposal ###

In this proposal, to address the problems mentioned above and make the current code more cleaner and easier to understand, mainly has 3 changes:

1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable` interface for token provider. Each service wants to communicate with Spark through token way needs to implement this interface.
2. Provide a `ConfigurableTokenManager` to manage all the register token providers, also token renewer and updater. Also this class offers the API for other modules to obtain tokens, get renewal interval and so on.
3. Implement 3 built-in token providers `HDFSTokenProvider`, `HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as supported today. Whether to load in these built-in token providers is controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by default for all the built-in token providers are loaded.

### Behavior Changes ###

For the end user there's no behavior change, we still use the same configuration `spark.yarn.security.tokens.${service}.enabled` to decide which token provider is enabled (hbase or hive).

For user implemented token provider (assume the name of token provider is "test") needs to add into this class should have two configurations:

1. `spark.yarn.security.tokens.test.enabled` to true
2. `spark.yarn.security.tokens.test.class` to the full qualified class name.

So we still keep the same semantics as current code while add one new configuration.

### Current Status ###

- [x] token provider interface and management framework.
- [x] implement built-in token providers (hdfs, hbase, hive).
- [x] Coverage of unit test.
- [x] Integrated test with security cluster.

## How was this patch tested?

Unit test and integrated test.

Please suggest and review, any comment is greatly appreciated.

Author: jerryshao <sshao@hortonworks.com>

Closes #14065 from jerryshao/SPARK-16342.

ab648c00

Jul 14, 2016

[SPARK-16505][YARN] Optionally propagate error during shuffle service startup. · b7b5e178

Marcelo Vanzin authored 8 years ago

This prevents the NM from starting when something is wrong, which would
lead to later errors which are confusing and harder to debug.

Added a unit test to verify startup fails if something is wrong.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #14162 from vanzin/SPARK-16505.

b7b5e178

Jun 29, 2016

[SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn · 272a2f78

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn.

One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem.

## How was this patch tested?

Manually verified with Hadoop 2.7.1.

Author: jerryshao <sshao@hortonworks.com>

Closes #13712 from jerryshao/SPARK-15990.

272a2f78

Jun 23, 2016

[SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation. · 738f134b

Ryan Blue authored 8 years ago

## What changes were proposed in this pull request?

This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.

This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.

## How was this patch tested?

This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.

Author: Ryan Blue <blue@apache.org>

Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.

738f134b

May 27, 2016

[YARN][DOC][MINOR] Remove several obsolete env variables and update the doc · 1b98fa2e

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

Remove several obsolete env variables not supported for Spark on YARN now, also updates the docs to include several changes with 2.0.

## How was this patch tested?

N/A

CC vanzin tgravescs

Author: jerryshao <sshao@hortonworks.com>

Closes #13296 from jerryshao/yarn-doc.

1b98fa2e

May 26, 2016

[SPARK-13148][YARN] document zero-keytab Oozie application launch; add diagnostics · 01b350a4

Steve Loughran authored 8 years ago

This patch provides detail on what to do for keytabless Oozie launches of spark apps, and adds some debug-level diagnostics of what credentials have been submitted

Author: Steve Loughran <stevel@hortonworks.com>
Author: Steve Loughran <stevel@apache.org>

Closes #11033 from steveloughran/stevel/feature/SPARK-13148-oozie.

01b350a4

Apr 28, 2016

[SPARK-6735][YARN] Add window based executor failure tracking mechanism for long running service · 8b44bd52

jerryshao authored 8 years ago

This work is based on twinkle-sachdeva 's proposal. In parallel to such mechanism for AM failures, here add similar mechanism for executor failure tracking, this is useful for long running Spark service to mitigate the executor failure problems.

Please help to review, tgravescs sryza and vanzin

Author: jerryshao <sshao@hortonworks.com>

Closes #10241 from jerryshao/SPARK-6735.

8b44bd52

Apr 14, 2016

[SPARK-14572][DOC] Update config docs to allow -Xms in extraJavaOptions · f83ba454

Dhruve Ashar authored 8 years ago

## What changes were proposed in this pull request?
The configuration docs are updated to reflect the changes introduced with [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This allows the user to specify initial heap memory settings through the extraJavaOptions for executor, driver and am.

## How was this patch tested?
The changes are tested in [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This is just documenting the changes made.

Author: Dhruve Ashar <dhruveashar@gmail.com>

Closes #12333 from dhruve/doc/SPARK-14572.

f83ba454

Apr 05, 2016

[SPARK-13063][YARN] Make the SPARK YARN STAGING DIR as configurable · bc36df12

Devaraj K authored 8 years ago

## What changes were proposed in this pull request?
Made the SPARK YARN STAGING DIR as configurable with the configuration as 'spark.yarn.staging-dir'.

## How was this patch tested?

I have verified it manually by running applications on yarn, If the 'spark.yarn.staging-dir' is configured then the value used as staging directory otherwise uses the default value i.e. file system’s home directory for the user.

Author: Devaraj K <devaraj@apache.org>

Closes #12082 from devaraj-kavali/SPARK-13063.

bc36df12

Apr 01, 2016

[SPARK-12343][YARN] Simplify Yarn client and client argument · 8ba2b7f2

jerryshao authored 9 years ago

## What changes were proposed in this pull request?

Currently in Spark on YARN, configurations can be passed through SparkConf, env and command arguments, some parts are duplicated, like client argument and SparkConf. So here propose to simplify the command arguments.

## How was this patch tested?

This patch is tested manually with unit test.

CC vanzin tgravescs , please help to suggest this proposal. The original purpose of this JIRA is to remove `ClientArguments`, through refactoring some arguments like `--class`, `--arg` are not so easy to replace, so here I remove the most part of command line arguments, only keep the minimal set.

Author: jerryshao <sshao@hortonworks.com>

Closes #11603 from jerryshao/SPARK-12343.

8ba2b7f2

Mar 18, 2016

[MINOR][DOCS] Update build descriptions and commands · c11ea2e4

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents.

## How was this patch tested?

N/A

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11838 from dongjoon-hyun/fix_doc_building_spark.

c11ea2e4

Mar 11, 2016

[SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive. · 07f1c544

Marcelo Vanzin authored 9 years ago

In preparation for the demise of assemblies, this change allows the
YARN backend to use multiple jars and globs as the "Spark jar". The
config option has been renamed to "spark.yarn.jars" to reflect that.

A second option "spark.yarn.archive" was also added; if set, this
takes precedence and uploads an archive expected to contain the jar
files with the Spark code and its dependencies.

Existing deployments should keep working, mostly. This change drops
support for the "SPARK_JAR" environment variable, and also does not
fall back to using "jarOfClass" if no configuration is set, falling
back to finding files under SPARK_HOME instead. This should be fine
since "jarOfClass" probably wouldn't work unless you were using
spark-submit anyway.

Tested with the unit tests, and trying the different config options
on a YARN cluster.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11500 from vanzin/SPARK-13577.

07f1c544

Jan 21, 2016

[SPARK-12534][DOC] update documentation to list command line equivalent to properties · 85200c09

felixcheung authored 9 years ago

Several Spark properties equivalent to Spark submit command line options are missing.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10491 from felixcheung/sparksubmitdoc.

85200c09

Jan 15, 2016
- [SPARK-2930] clarify docs on using webhdfs with spark.yarn.access.nam… · 96fb894d
  Tom Graves authored 9 years ago
  
  …enodes Author: Tom Graves <tgraves@yahoo-inc.com> Closes #10699 from tgravescs/SPARK-2930.
  96fb894d
Dec 01, 2015

[SPARK-11821] Propagate Kerberos keytab for all environments · 6a8cf80c

woj-i authored 9 years ago

andrewor14 the same PR as in branch 1.5
harishreedharan

Author: woj-i <wojciechindyk@gmail.com>

Closes #9859 from woj-i/master.

6a8cf80c

Nov 23, 2015

[SPARK-7173][YARN] Add label expression support for application master · 5fd86e4f

jerryshao authored 9 years ago

Add label expression support for AM to restrict it runs on the specific set of nodes. I tested it locally and works fine.

sryza and vanzin please help to review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #9800 from jerryshao/SPARK-7173.

5fd86e4f

Oct 20, 2015

[SPARK-11105] [YARN] Distribute log4j.properties to executors · 2f6dd634

vundela authored 9 years ago

Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.

If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties

Author: vundela <vsr@cloudera.com>
Author: Srinivasa Reddy Vundela <vsr@cloudera.com>

Closes #9118 from vundela/master.

2f6dd634

Oct 12, 2015

[SPARK-10739] [YARN] Add application attempt window for Spark on Yarn · f97e9323

jerryshao authored 9 years ago

Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures.

Author: jerryshao <sshao@hortonworks.com>

Closes #8857 from jerryshao/SPARK-10739 and squashes the following commits:

36eabdc [jerryshao] change the doc
7f9b77d [jerryshao] Style change
1c9afd0 [jerryshao] Address the comments
caca695 [jerryshao] Add application attempt window for Spark on Yarn

f97e9323

Oct 04, 2015

[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to... · 82bbc2a5

Sean Owen authored 9 years ago

[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'.

Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs.
Follow-on to https://github.com/apache/spark/pull/8385
CC nssalian

Author: Sean Owen <sowen@cloudera.com>

Closes #8968 from srowen/SPARK-9570.

82bbc2a5

Sep 21, 2015

[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables · ca9fe540

Jacek Laskowski authored 9 years ago

* Backticks are processed properly in Spark Properties table
* Removed unnecessary spaces
* See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8795 from jaceklaskowski/docs-yarn-formatting.

ca9fe540

Sep 17, 2015

[SPARK-10660] Doc describe error in the "Running Spark on YARN" page · c88bb5df

yangping.wu authored 9 years ago

In the Configuration section, the **spark.yarn.driver.memoryOverhead** and **spark.yarn.am.memoryOverhead**‘s default value should be "driverMemory * 0.10, with minimum of 384" and "AM memory * 0.10, with minimum of 384" respectively. Because from Spark 1.4.0, the **MEMORY_OVERHEAD_FACTOR** is set to 0.1.0, not 0.07.

Author: yangping.wu <wyphao.2007@163.com>

Closes #8797 from 397090770/SparkOnYarnDocError.

c88bb5df

Sep 15, 2015

[DOCS] Small fixes to Spark on Yarn doc · 416003b2

Jacek Laskowski authored 9 years ago

* a follow-up to 16b6d186 as `--num-executors` flag is not suppported.
* links + formatting

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8762 from jaceklaskowski/docs-spark-on-yarn.

416003b2

Aug 19, 2015

[SPARK-9833] [YARN] Add options to disable delegation token retrieval. · 5fd53c64

Marcelo Vanzin authored 9 years ago

This allows skipping the code that tries to talk to Hive and HBase to
fetch delegation tokens, in case that somehow conflicts with the application
being run.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8134 from vanzin/SPARK-9833.

5fd53c64

Aug 18, 2015

[SPARK-9782] [YARN] Support YARN application tags via SparkConf · 9b731fad

Dennis Huo authored 9 years ago

Add a new test case in yarn/ClientSuite which checks how the various SparkConf
and ClientArguments propagate into the ApplicationSubmissionContext.

Author: Dennis Huo <dhuo@google.com>

Closes #8072 from dennishuo/dhuo-yarn-application-tags.

9b731fad

Aug 12, 2015

[SPARK-9092] Fixed incompatibility when both num-executors and dynamic... · 738f3539

Niranjan Padmanabhan authored 9 years ago

… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes #7657 from neurons/SPARK-9092.

738f3539

Jul 27, 2015

[SPARK-8405] [DOC] Add how to view logs on Web UI when yarn log aggregation is enabled · 62283816

Carson Wang authored 9 years ago

Some users may not be aware that the logs are available on Web UI even if Yarn log aggregation is enabled. Update the doc to make this clear and what need to be configured.

Author: Carson Wang <carson.wang@intel.com>

Closes #7463 from carsonwang/YarnLogDoc and squashes the following commits:

274c054 [Carson Wang] Minor text fix
74df3a1 [Carson Wang] address comments
5a95046 [Carson Wang] Update the text in the doc
e5775c1 [Carson Wang] Update doc about how to view the logs on Web UI when yarn log aggregation is enabled

62283816

Jun 27, 2015

[SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" document · d48e7893

Neelesh Srinivas Salian authored 9 years ago

As per the description in the JIRA, I moved the contents of the page and added a few additional content.

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #6924 from nssalian/SPARK-3629 and squashes the following commits:

944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters
40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line
9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section
8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update
151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document

d48e7893

Jun 26, 2015

[SPARK-8302] Support heterogeneous cluster install paths on YARN. · 37bf76a2

Marcelo Vanzin authored 9 years ago

Some users have Hadoop installations on different paths across
their cluster. Currently, that makes it hard to set up some
configuration in Spark since that requires hardcoding paths to
jar files or native libraries, which wouldn't work on such a cluster.

This change introduces a couple of YARN-specific configurations
that instruct the backend to replace certain paths when launching
remote processes. That way, if the configuration says the Spark
jar is in "/spark/spark.jar", and also says that "/spark" should be
replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers
in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location
of the jar.

Coupled with YARN's environment whitelist (which allows certain
env variables to be exposed to containers), this allows users to
support such heterogeneous environments, as long as a single
replacement is enough. (Otherwise, this feature would need to be
extended to support multiple path replacements.)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6752 from vanzin/SPARK-8302 and squashes the following commits:

4bff8d4 [Marcelo Vanzin] Add docs, rename configs.
0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it.
2e9cc9d [Marcelo Vanzin] Style.
a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.

37bf76a2

May 29, 2015

[SPARK-7524] [SPARK-7846] add configs for keytab and principal, pass these two... · a51b133d

WangTaoTheTonic authored 9 years ago

[SPARK-7524] [SPARK-7846] add configs for keytab and principal, pass these two configs with different way in different modes

* As spark now supports long running service by updating tokens for namenode, but only accept parameters passed with "--k=v" format which is not very convinient. This patch add spark.* configs in properties file and system property.

*  --principal and --keytabl options are passed to client but when we started thrift server or spark-shell these two are also passed into the Main class (org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 and org.apache.spark.repl.Main).
In these two main class, arguments passed in will be processed with some 3rd libraries, which will lead to some error: "Invalid option: --principal" or "Unrecgnised option: --principal".
We should pass these command args in different forms, say system properties.

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #6051 from WangTaoTheTonic/SPARK-7524 and squashes the following commits:

e65699a [WangTaoTheTonic] change logic to loadEnvironments
ebd9ea0 [WangTaoTheTonic] merge master
ecfe43a [WangTaoTheTonic] pass keytab and principal seperately in different mode
33a7f40 [WangTaoTheTonic] expand the use of the current configs
08bb4e8 [WangTaoTheTonic] fix wrong cite
73afa64 [WangTaoTheTonic] add configs for keytab and principal, move originals to internal

a51b133d