Skip to content
Snippets Groups Projects
Commit fd84229e authored by Xiangrui Meng's avatar Xiangrui Meng
Browse files

[SPARK-5802][MLLIB] cache transformed data in glm

If we need to transform the input data, we should cache the output to avoid re-computing feature vectors every iteration. dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #4593 from mengxr/SPARK-5802 and squashes the following commits:

ae3be84 [Xiangrui Meng] cache transformed data in glm
parent d380f324
No related branches found
No related tags found
No related merge requests found
......@@ -205,7 +205,7 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel]
throw new SparkException("Input validation failed.")
}
/**
/*
* Scaling columns to unit variance as a heuristic to reduce the condition number:
*
* During the optimization process, the convergence (rate) depends on the condition number of
......@@ -225,26 +225,27 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel]
* Currently, it's only enabled in LogisticRegressionWithLBFGS
*/
val scaler = if (useFeatureScaling) {
(new StandardScaler(withStd = true, withMean = false)).fit(input.map(x => x.features))
new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))
} else {
null
}
// Prepend an extra variable consisting of all 1.0's for the intercept.
val data = if (addIntercept) {
if (useFeatureScaling) {
input.map(labeledPoint =>
(labeledPoint.label, appendBias(scaler.transform(labeledPoint.features))))
} else {
input.map(labeledPoint => (labeledPoint.label, appendBias(labeledPoint.features)))
}
} else {
if (useFeatureScaling) {
input.map(labeledPoint => (labeledPoint.label, scaler.transform(labeledPoint.features)))
// TODO: Apply feature scaling to the weight vector instead of input data.
val data =
if (addIntercept) {
if (useFeatureScaling) {
input.map(lp => (lp.label, appendBias(scaler.transform(lp.features)))).cache()
} else {
input.map(lp => (lp.label, appendBias(lp.features))).cache()
}
} else {
input.map(labeledPoint => (labeledPoint.label, labeledPoint.features))
if (useFeatureScaling) {
input.map(lp => (lp.label, scaler.transform(lp.features))).cache()
} else {
input.map(lp => (lp.label, lp.features))
}
}
}
/**
* TODO: For better convergence, in logistic regression, the intercepts should be computed
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment