Skip to content
Snippets Groups Projects
Commit 8028a288 authored by Reynold Xin's avatar Reynold Xin
Browse files

[SPARK-14628][CORE] Simplify task metrics by always tracking read/write metrics

## What changes were proposed in this pull request?

Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark.

This patch also changes a few behaviors.

1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc).
2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225)

## How was this patch tested?

existing tests.

This is bases on https://github.com/apache/spark/pull/12388, with more test fixes.

Author: Reynold Xin <rxin@databricks.com>
Author: Wenchen Fan <wenchen@databricks.com>

Closes #12417 from cloud-fan/metrics-refactor.
parent 90b46e01
No related branches found
No related tags found
No related merge requests found
Showing
with 148 additions and 237 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment