Skip to content
Snippets Groups Projects
mllib-linear-methods.md 31.72 KiB
layout: global
title: Linear Methods - MLlib
displayTitle: <a href="mllib-guide.html">MLlib</a> - Linear Methods
  • Table of contents {:toc}

\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \]

Mathematical formulation

Many standard machine learning methods can be formulated as a convex optimization problem, i.e. the task of finding a minimizer of a convex function $f$ that depends on a variable vector $\wv$ (called weights in the code), which has $d$ entries. Formally, we can write this as the optimization problem $\min_{\wv \in\R^d} \; f(\wv)$, where the objective function is of the form \begin{equation} f(\wv) := \lambda\, R(\wv) + \frac1n \sum_{i=1}^n L(\wv;\x_i,y_i) \label{eq:regPrimal} \ . \end{equation} Here the vectors $\x_i\in\R^d$ are the training data examples, for $1\le i\le n$, and $y_i\in\R$ are their corresponding labels, which we want to predict. We call the method linear if L(\wv; \x, y) can be expressed as a function of \wv^T x and y. Several of MLlib's classification and regression algorithms fall into this category, and are discussed here.

The objective function $f$ has two parts: the regularizer that controls the complexity of the model, and the loss that measures the error of the model on the training data. The loss function $L(\wv;.)$ is typically a convex function in $\wv$. The fixed regularization parameter $\lambda \ge 0$ (regParam in the code) defines the trade-off between the two goals of minimizing the loss (i.e., training error) and minimizing model complexity (i.e., to avoid overfitting).

Loss functions

The following table summarizes the loss functions and their gradients or sub-gradients for the methods MLlib supports:

loss function $L(\wv; \x, y)$ gradient or sub-gradient
hinge loss $\max \{0, 1-y \wv^T \x \}, \quad y \in \{-1, +1\}$ $\begin{cases}-y \cdot \x & \text{if $y \wv^T \x <1$}, \\ 0 & \text{otherwise}.\end{cases}$
logistic loss $\log(1+\exp( -y \wv^T \x)), \quad y \in \{-1, +1\}$ $-y \left(1-\frac1{1+\exp(-y \wv^T \x)} \right) \cdot \x$
squared loss $\frac{1}{2} (\wv^T \x - y)^2, \quad y \in \R$ $(\wv^T \x - y) \cdot \x$

Regularizers