-
- Downloads
[SPARK-20923] turn tracking of TaskMetrics._updatedBlockStatuses off
## What changes were proposed in this pull request? Turn tracking of TaskMetrics._updatedBlockStatuses off by default. As far as I can see its not used by anything and it uses a lot of memory when caching and processing a lot of blocks. In my case it was taking 5GB of a 10GB heap and I even went up to 50GB heap and the job still ran out of memory. With this change in place the same job easily runs in less then 10GB of heap. We leave the api there as well as a config to turn it back on just in case anyone is using it. TaskMetrics is exposed via SparkListenerTaskEnd so if users are relying on it they can turn it back on. ## How was this patch tested? Ran unit tests that were modified and manually tested on a couple of jobs (with and without caching). Clicked through the UI and didn't see anything missing. Ran my very large hive query job with 200,000 small tasks, 1000 executors, cached 6+TB of data this runs fine now whereas without this change it would go into full gcs and eventually die. Author: Thomas Graves <tgraves@thirteenroutine.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes #18162 from tgravescs/SPARK-20923.
Showing
- core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 6 additions, 0 deletions...rc/main/scala/org/apache/spark/executor/TaskMetrics.scala
- core/src/main/scala/org/apache/spark/internal/config/package.scala 8 additions, 0 deletions...main/scala/org/apache/spark/internal/config/package.scala
- core/src/main/scala/org/apache/spark/storage/BlockManager.scala 4 additions, 2 deletions...rc/main/scala/org/apache/spark/storage/BlockManager.scala
- core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 31 additions, 1 deletion...st/scala/org/apache/spark/storage/BlockManagerSuite.scala
Loading
Please register or sign in to comment