Vector and Martrix#

Vector Projection#

The projection of a vector \(\boldsymbol{y} \in \mathbb{R}^m\) onto the span of \(\left\{\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right\}\) (here we assume \(\boldsymbol{x}_i \in \mathbb{R}^m\) and denote \(\operatorname{span}(\{\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\})\) as \(\boldsymbol{X}\) ) is the vector \(\boldsymbol{v} \in \boldsymbol{X}\), such that \(\boldsymbol{v}\) is as close as possible to \(\boldsymbol {y}\), as measured by the Euclidean norm \(\|\boldsymbol{v}-\boldsymbol{y}\|_2\). We denote the projection as \(\operatorname{Proj}\left(\boldsymbol {y} ; \left\{\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right\}\right)\) and can define it formally as

(1)#\[\operatorname{Proj}(\boldsymbol{y} ;\{\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\})=\mathop{\arg\min}\limits_{\boldsymbol{v} \in \boldsymbol{X}}\|\boldsymbol{y}-\boldsymbol{v}\|_2\]

Given a full rank matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) with \(m \geq n\), we can define the projection of a vector \(\boldsymbol{y} \in \mathbb{R}^m\) onto the range of \(\mathbf{A}\) as follows:

(2)#\[\operatorname{Proj}(\boldsymbol{y} ; \mathbf{A})=\mathop{\arg\min}\limits_{\boldsymbol{v} \in \mathcal{R}(\mathbf{A})}\|\boldsymbol{v}-\boldsymbol{y}\|_2=\mathbf{A}\left(\mathbf{A}^{\top} \mathbf{A}\right)^{-1} \mathbf{A}^{\top} \boldsymbol{y}\]

Norms#

Introduction of Vector Norm

A norm of a vector \(\Vert\boldsymbol{x}\Vert\) is a measure of the “length” of the vector. More formally, a norm is any function \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) that satisfies four properties:

  1. For all \(\boldsymbol{x} \in \mathbb{R}^n, f(\boldsymbol{x}) \geq 0\) (non-negativity).

  2. \(f(\boldsymbol{x})=0\) if and only if \(\boldsymbol{x}=\mathbf{0}\) (definiteness).

  3. For all \(\boldsymbol{x} \in \mathbb{R}^n, t \in \mathbb{R}, f(t \boldsymbol{x})=|t| f(x)\) (absolute value homogeneity).

  4. For all \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^n, f(\boldsymbol{x}+\boldsymbol{y}) \leq f(\boldsymbol{x})+f(\boldsymbol{y})\) (triangle inequality).

Consider the following common examples:

  • p-norm: \(\|\boldsymbol{x}\|_p=\left(\sum_{i=1}^n\left|x_i\right|^p\right)^{1 / p}\), for \(p \geq 1\).

  • Max-norm: \(\|x\|_{\infty}=\max _i\left|x_i\right|\).

  • 2-norm: \(\|\boldsymbol{x}\|_2=\sqrt{\sum_{i=1}^n x_i^2}\), also called Euclidean norm. Note that \(\|\boldsymbol{x}\|_2^2=\boldsymbol{x}^{\top} \boldsymbol{x}\).

  • 1-norm: \(\|\boldsymbol{x}\|_1=\sum_{i=1}^n\left|x_i\right|\).

  • 0-norm: \(\|\boldsymbol{x}\|_0=\sum_{i=1}^n \mathbb{I}\left(\left|x_i\right|>0\right)\). This is a pseudo-norm since it does not satisfy homogeneity. It counts the number of non-zero elements in \(\boldsymbol{x}\). If we define \(0^0=0\), we can write this as \(\|\boldsymbol{x}\|_0=\sum_{i=1}^n x_i^0\)

Introduction of Matrix Norm

Suppose a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) as defining a linear function \(f(\boldsymbol{x})=\mathbf{A} \boldsymbol{x}\). We define the induced norm of \(\mathbf{A}\) as the maximum amount by which \(f\) can lengthen any unit-norm input:

(3)#\[\|\mathbf{A}\|_p=\max _{\boldsymbol{x} \neq 0} \frac{\|\mathbf{A} \boldsymbol{x}\|_p}{\|\boldsymbol{x}\|_p}=\max _{\|\boldsymbol{x}\|=1}\|\mathbf{A} \boldsymbol{x}\|_p\]

Typically \(p=2\), in which case

(4)#\[\|\mathbf{A}\|_2=\sqrt{\lambda_{\max }\left(\mathbf{A}^{\top} \mathbf{A}\right)}=\max _i \sigma_i\]

where \(\sigma_i\) is the \(i^{th}\) singular value. The Nuclear norm also called the trace norm, is defined as

(5)#\[\|\mathbf{A}\|_*=\operatorname{tr}\left(\sqrt{\mathbf{A}^{\top} \mathbf{A}}\right)=\sum_i \sigma_i\]

where \(\sqrt{\mathbf{A}^{\top} \mathbf{A}}\) is the matrix square root. Since the singular values are always non-negative, we have

(6)#\[\|\mathbf{A}\|_*=\sum_i\left|\sigma_i\right|=\|\boldsymbol{\sigma}\|_1\]

Using this as a regularizer encourages many singular values to become zero, resulting in a low-rank matrix. More generally, we can define the Schatten \(p\)-norm as

(7)#\[\|\mathbf{A}\|_p=\left(\sum_i \sigma_i^p(\mathbf{A})\right)^{1 / p}\]

If we think of a matrix as a vector, we can define the matrix norm in terms of a vector norm, \(\|\mathbf{A}\|=\|\operatorname{vec}(\mathbf{A})\|\). If the vector norm is the 2-norm, the corresponding matrix norm is the Frobenius norm:

(8)#\[\|\mathbf{A}\|_F=\sqrt{\sum_{i=1}^m \sum_{j=1}^n a_{i j}^2}=\sqrt{\operatorname{tr}\left(\mathbf{A}^{\top} \mathbf{A}\right)}=\|\operatorname{vec}(\mathbf{A})\|_2\]

If \(\mathbf{A}\) is expensive to evaluate, but \(\mathbf{A} \boldsymbol{v}\) is cheap (for a random vector \(\boldsymbol{v}\) ), we can create a stochastic approximation to the Frobenius norm by using the Hutchinson trace estimator as follows:

(9)#\[\|\mathbf{A}\|_F^2=\operatorname{tr}\left(\mathbf{A}^{\top} \mathbf{A}\right)=\mathbb{E}\left[\boldsymbol{v}^{\top} \mathbf{A}^{\top} \mathbf{A} \boldsymbol{v}\right]=\mathbb{E}\left[\|\mathbf{A} \boldsymbol{v}\|_2^2\right]\]

where \(\boldsymbol{v} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\).