Skip to content
Snippets Groups Projects
Commit ca695585 authored by Peng Meng's avatar Peng Meng Committed by Sean Owen
Browse files

[SPARK-21638][ML] Fix RF/GBT Warning message error

## What changes were proposed in this pull request?

When train RF model, there are many warning messages like this:

> WARN  RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration.

This warning message is unnecessary and the data is not accurate.

Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration.

## How was this patch tested?
The existing UT

Author: Peng Meng <peng.meng@intel.com>

Closes #18868 from mpjlu/fixRFwarning.
parent 95ad960c
No related branches found
No related tags found
No related merge requests found
...@@ -1089,7 +1089,8 @@ private[spark] object RandomForest extends Logging { ...@@ -1089,7 +1089,8 @@ private[spark] object RandomForest extends Logging {
var numNodesInGroup = 0 var numNodesInGroup = 0
// If maxMemoryInMB is set very small, we want to still try to split 1 node, // If maxMemoryInMB is set very small, we want to still try to split 1 node,
// so we allow one iteration if memUsage == 0. // so we allow one iteration if memUsage == 0.
while (nodeStack.nonEmpty && (memUsage < maxMemoryUsage || memUsage == 0)) { var groupDone = false
while (nodeStack.nonEmpty && !groupDone) {
val (treeIndex, node) = nodeStack.top val (treeIndex, node) = nodeStack.top
// Choose subset of features for node (if subsampling). // Choose subset of features for node (if subsampling).
val featureSubset: Option[Array[Int]] = if (metadata.subsamplingFeatures) { val featureSubset: Option[Array[Int]] = if (metadata.subsamplingFeatures) {
...@@ -1107,9 +1108,11 @@ private[spark] object RandomForest extends Logging { ...@@ -1107,9 +1108,11 @@ private[spark] object RandomForest extends Logging {
mutableTreeToNodeToIndexInfo mutableTreeToNodeToIndexInfo
.getOrElseUpdate(treeIndex, new mutable.HashMap[Int, NodeIndexInfo]())(node.id) .getOrElseUpdate(treeIndex, new mutable.HashMap[Int, NodeIndexInfo]())(node.id)
= new NodeIndexInfo(numNodesInGroup, featureSubset) = new NodeIndexInfo(numNodesInGroup, featureSubset)
numNodesInGroup += 1
memUsage += nodeMemUsage
} else {
groupDone = true
} }
numNodesInGroup += 1
memUsage += nodeMemUsage
} }
if (memUsage > maxMemoryUsage) { if (memUsage > maxMemoryUsage) {
// If maxMemoryUsage is 0, we should still allow splitting 1 node. // If maxMemoryUsage is 0, we should still allow splitting 1 node.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment