-
DB Tsai authored
Adding more description on top of #4861. Author: DB Tsai <dbtsai@alpinenow.com> Closes #4866 from dbtsai/doc and squashes the following commits: 37e9d07 [DB Tsai] doc
DB Tsai authoredAdding more description on top of #4861. Author: DB Tsai <dbtsai@alpinenow.com> Closes #4866 from dbtsai/doc and squashes the following commits: 37e9d07 [DB Tsai] doc
layout: global
title: Linear Methods - MLlib
displayTitle: <a href="mllib-guide.html">MLlib</a> - Linear Methods
- Table of contents {:toc}
\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \]
Mathematical formulation
Many standard machine learning methods can be formulated as a convex optimization problem, i.e.
the task of finding a minimizer of a convex function $f$
that depends on a variable vector
$\wv$
(called weights
in the code), which has $d$
entries.
Formally, we can write this as the optimization problem $\min_{\wv \in\R^d} \; f(\wv)$
, where
the objective function is of the form
\begin{equation} f(\wv) := \lambda\, R(\wv) + \frac1n \sum_{i=1}^n L(\wv;\x_i,y_i) \label{eq:regPrimal} \ . \end{equation}
Here the vectors $\x_i\in\R^d$
are the training data examples, for $1\le i\le n$
, and
$y_i\in\R$
are their corresponding labels, which we want to predict.
We call the method linear if L(\wv; \x, y) can be expressed as a function of \wv^T x and y.
Several of MLlib's classification and regression algorithms fall into this category,
and are discussed here.
The objective function $f$
has two parts:
the regularizer that controls the complexity of the model,
and the loss that measures the error of the model on the training data.
The loss function $L(\wv;.)$
is typically a convex function in $\wv$
. The
fixed regularization parameter $\lambda \ge 0$
(regParam
in the code)
defines the trade-off between the two goals of minimizing the loss (i.e.,
training error) and minimizing model complexity (i.e., to avoid overfitting).
Loss functions
The following table summarizes the loss functions and their gradients or sub-gradients for the methods MLlib supports:
loss function $L(\wv; \x, y)$ | gradient or sub-gradient | |
---|---|---|
hinge loss | $\max \{0, 1-y \wv^T \x \}, \quad y \in \{-1, +1\}$ | $\begin{cases}-y \cdot \x & \text{if $y \wv^T \x <1$}, \\ 0 & \text{otherwise}.\end{cases}$ |
logistic loss | $\log(1+\exp( -y \wv^T \x)), \quad y \in \{-1, +1\}$ | $-y \left(1-\frac1{1+\exp(-y \wv^T \x)} \right) \cdot \x$ |
squared loss | $\frac{1}{2} (\wv^T \x - y)^2, \quad y \in \R$ | $(\wv^T \x - y) \cdot \x$ |