Skip to content
Snippets Groups Projects
Commit 942847fd authored by omgteam's avatar omgteam Committed by Xiangrui Meng
Browse files

Bug Fix: without unpersist method in RandomForest.scala

During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below:
    In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage.
    In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too.

After added unpersist, it works right.
https://issues.apache.org/jira/browse/SPARK-3918

Author: omgteam <Kimlong.Liu@gmail.com>

Closes #2775 from omgteam/master and squashes the following commits:

815d543 [omgteam] adjust tab to spaces
1a36f83 [omgteam] Bug: fix without unpersist baggedInput in RandomForest.scala
parent 92e017fb
No related branches found
No related tags found
No related merge requests found
......@@ -176,6 +176,8 @@ private class RandomForest (
timer.stop("findBestSplits")
}
baggedInput.unpersist()
timer.stop("total")
logInfo("Internal timing for DecisionTree:")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment