Skip to content
Snippets Groups Projects
Commit c108a5d3 authored by Yanbo Liang's avatar Yanbo Liang
Browse files

[SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization.

## What changes were proposed in this pull request?
MLlib ```LinearRegression/LogisticRegression/LinearSVC``` always standardize the data during training to improve the rate of convergence regardless of _standardization_ is true or false. If _standardization_ is false, we perform reverse standardization by penalizing each component differently to get effectively the same objective function when the training dataset is not standardized. We should keep these comments in the code to let developers understand how we handle it correctly.

## How was this patch tested?
Existing tests, only adding some comments in code.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #18992 from yanboliang/SPARK-19762.
parent 84b5b16e
No related branches found
No related tags found
No related merge requests found
......@@ -39,9 +39,13 @@ private[ml] trait DifferentiableRegularization[T] extends DiffFunction[T] {
*
* @param regParam The magnitude of the regularization.
* @param shouldApply A function (Int => Boolean) indicating whether a given index should have
* regularization applied to it.
* regularization applied to it. Usually we don't apply regularization to
* the intercept.
* @param applyFeaturesStd Option for a function which maps coefficient index (column major) to the
* feature standard deviation. If `None`, no standardization is applied.
* feature standard deviation. Since we always standardize the data during
* training, if `standardization` is false, we have to reverse
* standardization by penalizing each component differently by this param.
* If `standardization` is true, this should be `None`.
*/
private[ml] class L2Regularization(
override val regParam: Double,
......@@ -57,6 +61,11 @@ private[ml] class L2Regularization(
val coef = coefficients(j)
applyFeaturesStd match {
case Some(getStd) =>
// If `standardization` is false, we still standardize the data
// to improve the rate of convergence; as a result, we have to
// perform this reverse standardization by penalizing each component
// differently to get effectively the same objective function when
// the training dataset is not standardized.
val std = getStd(j)
if (std != 0.0) {
val temp = coef / (std * std)
......@@ -66,6 +75,7 @@ private[ml] class L2Regularization(
0.0
}
case None =>
// If `standardization` is true, compute L2 regularization normally.
sum += coef * coef
gradient(j) = coef * regParam
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment