2.3 Singular Value Decomposition (SVD)

The spectral decomposition theorem (Proposition 2.3) gives a decomposition of any symmetric matrix. We now give a generalisation of this result which applies to all matrices.

If matrix \(\bA\) is not a square matrix, then it cannot have eigenvectors. Instead, it has singular vectors corresponding to singular values. Suppose \(\bf\) is a \(n\times p\) matrix. Then we say \(\sigma\) is a singular value with corresponding left and right singular vectors \(\bu\) and \(\bv\) (respectively) if \[\bA \bv = \sigma \bu \quad \mbox{ and }\quad \bA^\top \bu = \sigma \bv\] If \(\bA\) is a symmetric matrix then \(\bu=\bv\) is a eigenvector and \(\sigma\) is an eigenvalue.

The singular value decomposition (SVD) diagonalizes \(\bA\) into a product of a matrix of left singular vectors \(\bU\), a diagonal matrix of singular values \(\bSigma\), and a matrix of right singular vectors \(\bV\).

\[\bA = \bU \bSigma \bV^\top.\]

Proposition 2.5 (Singular value decomposition). Let \(\bA\) be a \(n \times p\) matrix of rank \(r\), where \(1 \leq r \leq \min(n,p)\). Then there exists a \(n \times r\) matrix \(\bU=[\bu_1,\ldots , \bu_r]\), a \(p \times r\) matrix \(\bV=[\bv_1,\ldots ,{ \bv}_r],\) and a \(r \times r\) diagonal matrix \(\bSigma=\text{diag}\{\sigma_1,\ldots , \sigma_r\}\) such that \[ \bA=\bU\bSigma \bV^\top =\sum_{i=1}^r \sigma_i \bu_i \bv_i^\top, \] where \(\bU^\top \bU = \bI_r = \bV^\top \bV\) and the \(\sigma_1 \geq \ldots \geq \sigma_r >0\).

Note that the \(\bu_i\) and the \(\bv_i\) are necessarily unit vectors, and that we have ordered the singular values from largest to smallest. The scalars \(\sigma_1, \ldots , \sigma_r\) are called the singular values of \(\bA\), the columns of \(\bU\) are the left singular vectors, and the columns of \(\bV\) are the right singular vectors.

The form of the SVD given above is called the compact singular value decomposition. Sometimes we write it in a non-compact form \[\bA = \bU\bSigma \bV^\top\] where \(\bU\) is a \(n \times n\) orthogonal matrix \((\bU^\top\bU=\bU\bU^\top = \bI_n)\), \(\bV\) is a \(p \times p\) orthogonal matrix \((\bV^\top\bV=\bV\bV^\top = \bI_p)\), and \(\bSigma\) is a \(n \times p\) diagonal matrix \[\begin{equation} \bSigma = \left(\begin{array}{cccccccc} \sigma_1&0&\ldots&&&&0\\ 0&\sigma_2&0&\ldots&&&\\ \vdots\\ 0&0&&\ldots&\sigma_r&&\\ 0&0&&\ldots&&0&\ldots\\ \vdots\\ 0&0&&\ldots&&&0\\ \end{array} \right). \tag{2.1} \end{equation}\] The columns of \(\bU\) and \(\bV\) form an orthonormal basis for \(\mathbb{R}^n\) and \(\mathbb{R}^p\) respectively. We can see that we recover the compact form of the SVD by only using the first \(r\) columns of \(\bU\) and \(\bV\), and truncating \(\bSigma\) to a \(r\times r\) matrix with non-zero diagonal elements.

When \(\bA\) is symmetric, we take \(\bU=\bV\), and the spectral decomposition theorem is recovered, and in this case (but not in general) the singular values of \(\bA\) are eigenvalues of \(\bA\).

Proof. \(\bA^\top \bA\) is a \(p\times p\) symmetric matrix, and so by the spectral decomposition theorem we can write it as \[\bA^\top \bA = \bV \bLambda \bV^\top\] where \(\bV\) is a \(p \times r\) semi-orthogonal matrix containing the orthonormal eigenvectors of \(\bA^\top \bA\), and \(\bLambda=\operatorname{diag}(\lambda_1, \ldots, \lambda_r)\) is a diagonal matrix of eigenvalues with \(\lambda_1\geq \ldots \geq\lambda_r>0\) (by Corollary 2.1).

For \(i=1,\dots, r\), let \(\sigma_i =\sqrt{\lambda_i}\) and let \(\bu_i = \frac{1}{\sigma_i} \bA \bv_i\). Then the vectors \(\bu_i\) are orthonormal: \[\begin{align*} \bu_i^\top \bu_j &=\frac{1}{\sigma_i\sigma_j} \bv_i^\top \bA^\top\bA \bv_j\\ &=\frac{\sigma_j^2}{\sigma_i\sigma_j} \bv_i^\top\bv_j \quad \mbox{ as }\bv_j \mbox{ is an eigenvector of } \bA^\top\bA\\ &=\begin{cases} 0 &\mbox{ if } i\not=j\\ 1 &\mbox{ if } i=j \end{cases}\quad \mbox{ as the } \bv_i \mbox{ are orthonormal vectors.} \end{align*}\] In addition \[\bA^\top\bu_i = \frac{1}{\sigma_i}\bA^\top\bA\bv_i = \frac{\sigma^2_i}{\sigma_i}\bv_i = \sigma_i\bv_i\] and so \(\bu_i\) and \(\bv_i\) are left and right singular vectors.

Let \(\bU=[\bu_1 \; \bu_2 \; \ldots \; \bu_r\; \ldots \; \bu_n]\), where \(\bu_{r+1}, \ldots, \bu_n\) are chosen to complete the orthonormal basis for \(\mathbb{R}^n\) given \(\bu_1, \ldots, \bu_r\), and let \(\bSigma\) be the \(n\times p\) diagonal matrix in Equation (2.1).

Then we have shown that \[\bU = \bA \bV \bSigma^{-1}\] Thus \[\begin{align*} \bU &= \bA \bV \bSigma^{-1}\\ \bU \bSigma &= \bA \bV\\ \bU \bSigma \bV^\top &= \bA. \end{align*}\]


Note that by construction we’ve shown that \(\bA^\top\bA\) has eigenvalues \(\sigma^2_i\) with corresponding eigenvectors \(\bv_i\). We also can also show that \(\bA\bA^\top\) has eigenvalues \(\sigma^2_i\), but with corresponding eigenvectors \(\bu_i\). \[\bA\bA^\top \bu_i = \sigma_i\bA\bv_i = \sigma^2_i \bu_i\]

Proposition 2.6 Let \(\bA\) be any matrix of rank \(r\). Then the non-zero eigenvalues of both \(\bA \bA^\top\) and \(\bA^\top \bA\) are \(\sigma_1^2, \ldots , \sigma_r^2\). The corresponding unit eigenvectors of \(\bA \bA^\top\) are given by the columns of \(\bU\), and the corresponding unit eigenvectors of \(\bA^\top \bA\) are given by the columns of \(\bV\).

Notes:

  1. The SVD expresses a matrix as a sum of rank-1 matrices \[\bA = \sum_{i=1}^r \sigma_i \bu_i \bv_i^\top.\] We can think of these as a list of the building blocks of \(\bA\) ordered by their importance (\(\sigma_1\geq \sigma_2\geq\ldots\)).

  2. The singular value decomposition theorem shows that every matrix is diagonal, provided one uses the proper bases for the domain and range spaces. We can diagonalize \(\bA\) by \[ \bU^\top\bA\bV=\bSigma.\]

  3. The SVD reveals a great deal about a matrix. Firstly, the rank of \(\bA\) is the number of non-zero singular values. The left singular vectors \(\bu_1, \ldots, \bu_r\) are an orthonormal basis for the columns space of \(\bA\), \(\mathcal{C}(\bA)\), and the right singular vectors \(\bv_1, \ldots, \bv_r\) are an orthonormal basis for \(\mathcal{C}(\bA^\top)\), the row space of \(\bA\). The vectors \(\bv_{r+1}, \ldots, \bv_p\) from the non-compact SVD are a basis for the kernel of \(\bA\) (sometimes called the null space \(\mathcal{N}(\bA)\)), and \(\bu_{r+1}, \ldots, \bu_n\) are a basis for \(\mathcal{N}(\bA^\top)\).

  4. The SVD has many uses in mathematics. One is as a generalized inverse of a matrix. If \(\bA\) is \(n \times p\) with \(n\not = p\), or if it is square but not of full rank, then \(\bA\) cannot have an inverse. However, we say \(\bA^+\) is a generalized inverse if \(\bA \bA^+\bA=\bA\). One such generalized inverse can be obtained from the SVD by \(\bA^+ = \bV \bSigma^{-1}\bU^\top\) - this is known as the Moore-Penrose pseudo-inverse.

2.3.1 Examples

In practice, we don’t compute SVDs of a matrix by hand: in R you can use the command SVD(A) to compute the SVD of matrix A. However, it is informative to do the calculation yourself a few times to help fix the ideas.

Example 2.2 Consider the matrix \(\bA = \bx \by^\top\). We can see this is a rank-1 matrix, so it only has one non-zero singular value which is \(\sigma_1 = ||\bx||.||\by||\). Its SVD is given by \[\bU = \frac{1}{||\bx|| }\bx,\quad \bV = \frac{1}{||\by|| }\by, \quad \mbox{ and } \Sigma = ||\bx||.||\by||.\]

Example 2.3 Let \[\bA = \left(\begin{array}{ccc}3&2&2\\ 2&3&-2\end{array}\right).\] Let’s try to find the SVD of \(\bA\).

We know the singular values are the square roots of the eigenvalues of \(\bA\bA^\top\) and \(\bA^\top\bA\). We’ll work with the former as it is only \(2\times 2\). \[\bA\bA^\top = \left(\begin{array}{cc}17&8\\ 8&17\end{array}\right) \quad \mbox{ and so } \det(\bA\bA^\top-\lambda \bI)=(17-\lambda)^2-64 \]

Solving \(\det(\bA\bA^\top-\lambda \bI)=0\) gives the eigenvalues to be \(\lambda=25\) or \(9\). Thus the singular values of \(\bA\) are \(\sigma_1=\sqrt{25}=5\) and \(\sigma_2=\sqrt{9}=3\), and \[\bSigma=\left(\begin{array}{cc}5&0\\ 0&3\end{array}\right).\]

The columns of \(\bU\) are the unit eigenvectors of \(\bA\bA^\top\) which we can find by solving \[\begin{align*}(\bA \bA^\top-25\bI_2)\bu&=\left(\begin{array}{cc}-8&8\\ 8&-8\end{array}\right)\left(\begin{array}{c}u_1\\u_2\\\end{array}\right)=\left(\begin{array}{c}0\\0\\\end{array}\right) \quad \mbox{ and }\\ (\bA \bA^\top-9\bI_2)\bu&=\left(\begin{array}{cc}8&8\\ 8&8\end{array}\right)\left(\begin{array}{c}u_1\\u_2\\\end{array}\right)=\left(\begin{array}{c}0\\0\\\end{array}\right).\end{align*}\] And so, remembering that the eigenvectors used to form \(\bU\) need to be unit vectors, we can see that \[\bU=\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1&1\\ 1&-1\end{array}\right).\] Finally, to compute \(\bV\) recall that \(\sigma_i \bv_i = \bA^\top \bu_i\) and so \[\bV = \bA^\top\bU\bSigma^{-1} = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1&\frac{1}{3}\\ 1&\frac{-1}{3}\\ 0&\frac{4}{3}\end{array}\right). \] This completes the calculation, and we can see that we can express \(\bA\) as \[\bA = \left(\begin{array}{cc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&\frac{-1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{cc}5&0\\ 0&3\end{array}\right)\left(\begin{array}{ccc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}&0\\ \frac{1}{3\sqrt{2}}& \frac{-1}{3 \sqrt{2}} &\frac{4}{3\sqrt{2}}\end{array}\right)\] or as the sum of rank-1 matrices:

\[\bA = 5\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{ccc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}} & 0 \end{array}\right)+ 3\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{-1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{ccc}\frac{1}{3\sqrt{2}}& \frac{-1}{3 \sqrt{2}} &\frac{4}{3\sqrt{2}}\end{array}\right)\]

This is the compact form of the SVD. To find the non-compact form we need \(\bV\) to be a \(3 \times 3\) matrix, which requires us to find a 3rd column that is orthogonal to the first two columns (thus completing an orthonormal basis for \(\mathbb{R}^3\)). We can do that with the vector \(\bv_3 = \frac{1}{3}(2\; -2\; -1)\) giving the non-compact SVD for \(\bA\).

\[\bA = \left(\begin{array}{cc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&\frac{-1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{ccc}5&0&0\\ 0&3&0\end{array}\right)\left(\begin{array}{ccc}\frac{1}{\sqrt{2}}& \frac{1}{3\sqrt{2}}&\frac{2}{3} \\ \frac{1}{\sqrt{2}}& \frac{-1}{3 \sqrt{2}} &\frac{-2}{3}\\ 0&\frac{4}{3\sqrt{2}}&\frac{-1}{3}\end{array}\right)^\top\]

Let’s check our answer in R.

A<- matrix(c(3,2,2,2,3,-2), nr=2, byrow=T)
svd(A)
## $d
## [1] 5 3
## 
## $u
##            [,1]       [,2]
## [1,] -0.7071068 -0.7071068
## [2,] -0.7071068  0.7071068
## 
## $v
##               [,1]       [,2]
## [1,] -7.071068e-01 -0.2357023
## [2,] -7.071068e-01  0.2357023
## [3,] -5.551115e-17 -0.9428090

The eigenvectors are only defined upto multiplication by \(-1\) and so we can multiply any pair of left and right singular vectors by \(-1\) and it is still a valid SVD.

Note: In practice this is a terrible way to compute the SVD as it is prone to numerical error. In practice an efficient iterative method is used in most software implementations (including R).