Skip to content
Snippets Groups Projects
Commit fdd466be authored by Vyacheslav Baranov's avatar Vyacheslav Baranov Committed by Sean Owen
Browse files

[SPARK-10182] [MLLIB] GeneralizedLinearModel doesn't unpersist cached data

`GeneralizedLinearModel` creates a cached RDD when building a model. It's inconvenient, since these RDDs flood the memory when building several models in a row, so useful data might get evicted from the cache.

The proposed solution is to always cache the dataset & remove the warning. There's a caveat though: input dataset gets evaluated twice, in line 270 when fitting `StandardScaler` for the first time, and when running optimizer for the second time. So, it might worth to return removed warning.

Another possible solution is to disable caching entirely & return removed warning. I don't really know what approach is better.

Author: Vyacheslav Baranov <slavik.baranov@gmail.com>

Closes #8395 from SlavikBaranov/SPARK-10182.
parent e1f4de4a
No related branches found
No related tags found
No related merge requests found
......@@ -359,6 +359,11 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel]
+ " parent RDDs are also uncached.")
}
// Unpersist cached data
if (data.getStorageLevel != StorageLevel.NONE) {
data.unpersist(false)
}
createModel(weights, intercept)
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment