Skip to content
Snippets Groups Projects
Commit d107b3b9 authored by Li Yichao's avatar Li Yichao Committed by Wenchen Fan
Browse files

[SPARK-20640][CORE] Make rpc timeout and retry for shuffle registration configurable.

## What changes were proposed in this pull request?

Currently the shuffle service registration timeout and retry has been hardcoded. This works well for small workloads but under heavy workload when the shuffle service is busy transferring large amount of data we see significant delay in responding to the registration request, as a result we often see the executors fail to register with the shuffle service, eventually failing the job. We need to make these two parameters configurable.

## How was this patch tested?

* Updated `BlockManagerSuite` to test registration timeout and max attempts configuration actually works.

cc sitalkedia

Author: Li Yichao <lyc@zhihu.com>

Closes #18092 from liyichao/SPARK-20640.
parent 9ce714dc
No related branches found
No related tags found
No related merge requests found
Showing
with 109 additions and 15 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment