2.4 SVD optimization results
Why are eigenvalues and singular values useful in statistics? It is because they appear as the result of some important optimization problems. We’ll see more about this in later chapters, but we’ll prove a few preliminary results here.
For example, suppose \(\bx\in\mathbb{R}^n\) is a random variable with \(\cov(\bx)=\bSigma\) (an \(n \times n\) matrix), then can we find a projection of \(\bx\) that has either maximum or minimum variance? I.e., can we find \(\ba\) such that \[\var (\ba^\top\bx)=\ba^\top \bSigma \ba\] is maximized or minimized? To make the question interesting we need to constrain the length of \(\ba\) so lets assume that \(||\ba||_2 = \sqrt{\ba^\top \ba}=1\), otherwise we could just take \(\ba=\bzero\) to obtain a projection with variance zero. So we want to solve the optimization problems involving the quadratic form \(\ba^\top \bSigma \ba\):
\[\begin{equation} \max_{\ba: \;\ba^\top \ba =1}\ba^\top \bSigma \ba, \quad \mbox{and}\quad \min_{\ba: \;\ba^\top \ba =1}\ba^\top \bSigma \ba. \tag{2.2} \end{equation}\]
Given that \(\bSigma\) is symmetric, we can write it as \[\bSigma = \bV\bLambda\bV^\top \] where \(\bLambda\) is the diagonal matrix of eigenvalues of \(\bSigma\), and \(\bV\) is an orthogonal matrix of eigenvectors. If we let \(\bb=\bV^\top \ba\) then \[\ba^\top \bSigma \ba = \bb^\top \bLambda \bb = \sum_{i=1}^n \lambda_i b_i^2\] and given that the eigenvalues are ordered \(\lambda_1\geq \lambda_2 \geq \ldots\) and that \[\sum_{i=1}^n b_i^2=\bb^\top\bb=\ba^\top \bV\bV^\top\ba=\ba^\top\ba=1,\] we can see that the maximumn is \(\lambda_1\) obtained by setting \(\bb=(1\;0\;0 \ldots)^\top\). Then \[\begin{align*} \bV^\top \ba &= \bb\\ \bV\bV^\top \ba &=\bV\bb\\ \ba &= \bv_1 \end{align*}\] so we can see that the maximum is obtained when \(\ba=\bv_1\), the eigenvector of \(\bSigma\) corresponding to the largest eigenvalue \(\lambda_1\).
Similarly, the minimum is \(\lambda_n\), which obtained by setting \(\bb=(\;0\;0 \ldots\;0\;1)^\top\) which corresponds to \(\ba=\bv_n\).
Proposition 2.7 For any symmetric \(n \times n\) matrix \(\bSigma\),
\[\max_{\ba: \ba^\top \ba =1} \ba^\top\bSigma\ba=\lambda_1,\]
where the maximum occurs at \(\ba=\pm \bv_1\), and
\[\min_{\ba: \ba^\top \ba =1} \ba^\top\bSigma\ba=\lambda_n\]
where the minimum occurs at \(\ba = \pm \bv_n\), where \(\lambda_i, \bv_i\) are the ordered eigenpairs of \(\bSigma\).
Note that \[\frac{\ba^\top \bSigma \ba}{\ba^\top\ba}=\frac{\ba^\top \bSigma \ba}{||\ba||^2} = (\frac{\ba}{||\ba||})^\top \bSigma(\frac{\ba}{||\ba||})\] and so another way to write the maximization problems (2.2) is as unconstrained optimization problems:
\[\max_{\ba}\frac{\ba^\top \bSigma \ba}{\ba^\top\ba}\quad \mbox{ and } \quad \min_{\ba}\frac{\ba^\top \bSigma \ba}{\ba^\top\ba}.\]
We obtain a similar result for non-square matrices using the singular value decomposition.
Proposition 2.8 For any matrix \(\bA\) \[\max_{\bx: ||\bx||_2=1}||\bA\bx||_2=\max_{\bx}\frac{||\bA\bx||_2}{||\bx||_2}=\sigma_1\] the first singular value of \(\bA\), with the maximum achieved at \(\bx=\bv_1\) (the first right singular vector).
Proof. This follows from 2.7 as \[||\bA\bx||_2^2=\bx^\top \bA^\top\bA\bx.\]
Here, \(||\bx||_2\) denotes the Euclidean norm for vectors:
\[||\bx||_2 = (\bx^\top \bx)^{\frac{1}{2}}.\]
Finally, we will need the following result when we study canonical correlation analysis:
Proposition 2.9 For any matrix \(\bA\), we have \[ \max_{\ba, \bb:\, \vert \vert \ba \vert \vert=\vert \vert \bb \vert \vert =1} \ba^\top \bA \bb =\sigma_1. \] with the maximum obtained at \(\ba=\bu_1\) and \(\bb=\bv_1\), the first left and right singular vectors of \(\bA\).
Proof. See Section 3.1
We’ll see much more of this kind of thing in Chapters ?? and 3.