docs/mllib-linear-methods.md · 2c3f83c34bb8d2c1bf13b33633d8c5a8089545d1 · cs525-sp18-g07 / spark · GitLab

Snippets Groups Projects

10 years ago

[SPARK-5537][MLlib][Docs] Add user guide for multinomial logistic regression · b1960561

DB Tsai authored 10 years ago

Adding more description on top of #4861.

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #4866 from dbtsai/doc and squashes the following commits:

37e9d07 [DB Tsai] doc

b1960561

[SPARK-5537][MLlib][Docs] Add user guide for multinomial logistic regression

DB Tsai authored 10 years ago

Adding more description on top of #4861.

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #4866 from dbtsai/doc and squashes the following commits:

37e9d07 [DB Tsai] doc

mllib-linear-methods.md 31.72 KiB

layout: global
title: Linear Methods - MLlib
displayTitle: <a href="mllib-guide.html">MLlib</a> - Linear Methods

Table of contents {:toc}

\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \]

Mathematical formulation

Many standard machine learning methods can be formulated as a convex optimization problem, i.e. the task of finding a minimizer of a convex function $f$ that depends on a variable vector $\wv$ (called weights in the code), which has $d$ entries. Formally, we can write this as the optimization problem $\min_{\wv \in\R^d} \; f(\wv)$ , where the objective function is of the form \begin{equation} f(\wv) := \lambda\, R(\wv) + \frac1n \sum_{i=1}^n L(\wv;\x_i,y_i) \label{eq:regPrimal} \ . \end{equation} Here the vectors $\x_i\in\R^d$ are the training data examples, for $1\le i\le n$ , and $y_i\in\R$ are their corresponding labels, which we want to predict. We call the method linear if L(\wv; \x, y) can be expressed as a function of \wv^T x and y. Several of MLlib's classification and regression algorithms fall into this category, and are discussed here.

The objective function $f$ has two parts: the regularizer that controls the complexity of the model, and the loss that measures the error of the model on the training data. The loss function $L(\wv;.)$ is typically a convex function in $\wv$ . The fixed regularization parameter $\lambda \ge 0$ (regParam in the code) defines the trade-off between the two goals of minimizing the loss (i.e., training error) and minimizing model complexity (i.e., to avoid overfitting).

Loss functions

The following table summarizes the loss functions and their gradients or sub-gradients for the methods MLlib supports:

	loss function $L(\wv; \x, y)$	gradient or sub-gradient
hinge loss	$\max \{0, 1-y \wv^T \x \}, \quad y \in \{-1, +1\}$	$\begin{cases}-y \cdot \x & \text{if $y \wv^T \x <1$}, \\ 0 & \text{otherwise}.\end{cases}$
logistic loss	$\log(1+\exp( -y \wv^T \x)), \quad y \in \{-1, +1\}$	$-y \left(1-\frac1{1+\exp(-y \wv^T \x)} \right) \cdot \x$
squared loss	$\frac{1}{2} (\wv^T \x - y)^2, \quad y \in \R$	$(\wv^T \x - y) \cdot \x$

Regularizers