3.4 SVD optimization results

Why are eigenvalues and singular values useful in statistics? It is because they appear as the result of some important optimization problems. We’ll see more about this in later chapters, but we’ll prove a few preliminary results here.

For example, suppose \(\mathbf x\in\mathbb{R}^n\) is a random variable with \({\mathbb{C}\operatorname{ov}}(\mathbf x)=\boldsymbol{\Sigma}\) (an \(n \times n\) matrix), then can we find a projection of \(\mathbf x\) that has either maximum or minimum variance? I.e., can we find \(\mathbf a\) such that \[{\mathbb{V}\operatorname{ar}}(\mathbf a^\top\mathbf x)=\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\] is maximized or minimized? To make the question interesting we need to constrain the length of \(\mathbf a\) so lets assume that \(||\mathbf a||_2 = \sqrt{\mathbf a^\top \mathbf a}=1\), otherwise we could just take \(\mathbf a={\boldsymbol 0}\) to obtain a projection with variance zero. So we want to solve the optimization problems involving the quadratic form \(\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\):

\[\begin{equation} \max_{\mathbf a: \;\mathbf a^\top \mathbf a=1}\mathbf a^\top \boldsymbol{\Sigma}\mathbf a, \quad \mbox{and}\quad \min_{\mathbf a: \;\mathbf a^\top \mathbf a=1}\mathbf a^\top \boldsymbol{\Sigma}\mathbf a. \tag{3.2} \end{equation}\]

Given that \(\boldsymbol{\Sigma}\) is symmetric, we can write it as \[\boldsymbol{\Sigma}= \mathbf V\boldsymbol \Lambda\mathbf V^\top \] where \(\boldsymbol \Lambda\) is the diagonal matrix of eigenvalues of \(\boldsymbol{\Sigma}\), and \(\mathbf V\) is an orthogonal matrix of eigenvectors. If we let \(\mathbf b=\mathbf V^\top \mathbf a\) then \[\mathbf a^\top \boldsymbol{\Sigma}\mathbf a= \mathbf b^\top \boldsymbol \Lambda\mathbf b= \sum_{i=1}^n \lambda_i b_i^2\] and given that the eigenvalues are ordered \(\lambda_1\geq \lambda_2 \geq \ldots\) and that \[\sum_{i=1}^n b_i^2=\mathbf b^\top\mathbf b=\mathbf a^\top \mathbf V\mathbf V^\top\mathbf a=\mathbf a^\top\mathbf a=1,\] we can see that the maximumn is \(\lambda_1\) obtained by setting \(\mathbf b=(1\;0\;0 \ldots)^\top\). Then \[\begin{align*} \mathbf V^\top \mathbf a&= \mathbf b\\ \mathbf V\mathbf V^\top \mathbf a&=\mathbf V\mathbf b\\ \mathbf a&= \mathbf v_1 \end{align*}\] so we can see that the maximum is obtained when \(\mathbf a=\mathbf v_1\), the eigenvector of \(\boldsymbol{\Sigma}\) corresponding to the largest eigenvalue \(\lambda_1\).

Similarly, the minimum is \(\lambda_n\), which obtained by setting \(\mathbf b=(\;0\;0 \ldots\;0\;1)^\top\) which corresponds to \(\mathbf a=\mathbf v_n\).

Proposition 3.7 For any symmetric \(n \times n\) matrix \(\boldsymbol{\Sigma}\), \[\max_{\mathbf a: \mathbf a^\top \mathbf a=1} \mathbf a^\top\boldsymbol{\Sigma}\mathbf a=\lambda_1,\]
where the maximum occurs at \(\mathbf a=\pm \mathbf v_1\), and \[\min_{\mathbf a: \mathbf a^\top \mathbf a=1} \mathbf a^\top\boldsymbol{\Sigma}\mathbf a=\lambda_n\] where the minimum occurs at \(\mathbf a= \pm \mathbf v_n\), where \(\lambda_i, \mathbf v_i\) are the ordered eigenpairs of \(\boldsymbol{\Sigma}\).

Note that \[\frac{\mathbf a^\top \boldsymbol{\Sigma}\mathbf a}{\mathbf a^\top\mathbf a}=\frac{\mathbf a^\top \boldsymbol{\Sigma}\mathbf a}{||\mathbf a||^2} = (\frac{\mathbf a}{||\mathbf a||})^\top \boldsymbol{\Sigma}(\frac{\mathbf a}{||\mathbf a||})\] and so another way to write the maximization problems (3.2) is as unconstrained optimization problems:

\[\max_{\mathbf a}\frac{\mathbf a^\top \boldsymbol{\Sigma}\mathbf a}{\mathbf a^\top\mathbf a}\quad \mbox{ and } \quad \min_{\mathbf a}\frac{\mathbf a^\top \boldsymbol{\Sigma}\mathbf a}{\mathbf a^\top\mathbf a}.\]

We obtain a similar result for non-square matrices using the singular value decomposition.

Proposition 3.8 For any matrix \(\mathbf A\) \[\max_{\mathbf x: ||\mathbf x||_2=1}||\mathbf A\mathbf x||_2=\max_{\mathbf x}\frac{||\mathbf A\mathbf x||_2}{||\mathbf x||_2}=\sigma_1\] the first singular value of \(\mathbf A\), with the maximum achieved at \(\mathbf x=\mathbf v_1\) (the first right singular vector).

Proof. This follows from 3.7 as \[||\mathbf A\mathbf x||_2^2=\mathbf x^\top \mathbf A^\top\mathbf A\mathbf x.\]

Here, \(||\mathbf x||_2\) denotes the Euclidean norm for vectors: \[||\mathbf x||_2 = (\mathbf x^\top \mathbf x)^{\frac{1}{2}}.\]

Finally, we will need the following result when we study canonical correlation analysis:

Proposition 3.9 For any matrix \(\mathbf A\), we have \[ \max_{\mathbf a, \mathbf b:\, \vert \vert \mathbf a\vert \vert=\vert \vert \mathbf b\vert \vert =1} \mathbf a^\top \mathbf A\mathbf b=\sigma_1. \] with the maximum obtained at \(\mathbf a=\mathbf u_1\) and \(\mathbf b=\mathbf v_1\), the first left and right singular vectors of \(\mathbf A\).

Proof. See Section 5.1

We’ll see much more of this kind of thing in Chapters 4 and 5.