3.3 Singular Value Decomposition (SVD)

The spectral decomposition theorem (Proposition 3.3) gives a decomposition of any symmetric matrix. We now give a generalisation of this result which applies to all matrices.

If matrix \(\mathbf A\) is not a square matrix, then it cannot have eigenvectors. Instead, it has singular vectors corresponding to singular values. Suppose \(\bf\) is a \(n\times p\) matrix. Then we say \(\sigma\) is a singular value with corresponding left and right singular vectors \(\mathbf u\) and \(\mathbf v\) (respectively) if \[\mathbf A\mathbf v= \sigma \mathbf u\quad \mbox{ and }\quad \mathbf A^\top \mathbf u= \sigma \mathbf v\] If \(\mathbf A\) is a symmetric matrix then \(\mathbf u=\mathbf v\) is a eigenvector and \(\sigma\) is an eigenvalue.

The singular value decomposition (SVD) diagonalizes \(\mathbf A\) into a product of a matrix of left singular vectors \(\mathbf U\), a diagonal matrix of singular values \(\boldsymbol{\Sigma}\), and a matrix of right singular vectors \(\mathbf V\).

\[\mathbf A= \mathbf U\boldsymbol{\Sigma}\mathbf V^\top.\]

Proposition 3.5 (Singular value decomposition). Let \(\mathbf A\) be a \(n \times p\) matrix of rank \(r\), where \(1 \leq r \leq \min(n,p)\). Then there exists a \(n \times r\) matrix \(\mathbf U=[\mathbf u_1,\ldots , \mathbf u_r]\), a \(p \times r\) matrix \(\mathbf V=[\mathbf v_1,\ldots ,{ \mathbf v}_r],\) and a \(r \times r\) diagonal matrix \(\boldsymbol{\Sigma}=\text{diag}\{\sigma_1,\ldots , \sigma_r\}\) such that \[ \mathbf A=\mathbf U\boldsymbol{\Sigma}\mathbf V^\top =\sum_{i=1}^r \sigma_i \mathbf u_i \mathbf v_i^\top, \] where \(\mathbf U^\top \mathbf U= \mathbf I_r = \mathbf V^\top \mathbf V\) and the \(\sigma_1 \geq \ldots \geq \sigma_r >0\).

Note that the \(\mathbf u_i\) and the \(\mathbf v_i\) are necessarily unit vectors, and that we have ordered the singular values from largest to smallest. The scalars \(\sigma_1, \ldots , \sigma_r\) are called the singular values of \(\mathbf A\), the columns of \(\mathbf U\) are the left singular vectors, and the columns of \(\mathbf V\) are the right singular vectors.

The form of the SVD given above is called the compact singular value decomposition. Sometimes we write it in a non-compact form \[\mathbf A= \mathbf U\boldsymbol{\Sigma}\mathbf V^\top\] where \(\mathbf U\) is a \(n \times n\) orthogonal matrix \((\mathbf U^\top\mathbf U=\mathbf U\mathbf U^\top = \mathbf I_n)\), \(\mathbf V\) is a \(p \times p\) orthogonal matrix \((\mathbf V^\top\mathbf V=\mathbf V\mathbf V^\top = \mathbf I_p)\), and \(\boldsymbol{\Sigma}\) is a \(n \times p\) diagonal matrix \[\begin{equation} \boldsymbol{\Sigma}= \left(\begin{array}{cccccccc} \sigma_1&0&\ldots&&&&0\\ 0&\sigma_2&0&\ldots&&&\\ \vdots\\ 0&0&&\ldots&\sigma_r&&\\ 0&0&&\ldots&&0&\ldots\\ \vdots\\ 0&0&&\ldots&&&0\\ \end{array} \right). \tag{3.1} \end{equation}\] The columns of \(\mathbf U\) and \(\mathbf V\) form an orthonormal basis for \(\mathbb{R}^n\) and \(\mathbb{R}^p\) respectively. We can see that we recover the compact form of the SVD by only using the first \(r\) columns of \(\mathbf U\) and \(\mathbf V\), and truncating \(\boldsymbol{\Sigma}\) to a \(r\times r\) matrix with non-zero diagonal elements.

When \(\mathbf A\) is symmetric, we take \(\mathbf U=\mathbf V\), and the spectral decomposition theorem is recovered, and in this case (but not in general) the singular values of \(\mathbf A\) are eigenvalues of \(\mathbf A\).

Proof. \(\mathbf A^\top \mathbf A\) is a \(p\times p\) symmetric matrix, and so by the spectral decomposition theorem we can write it as \[\mathbf A^\top \mathbf A= \mathbf V\boldsymbol \Lambda\mathbf V^\top\] where \(\mathbf V\) is a \(p \times r\) semi-orthogonal matrix containing the orthonormal eigenvectors of \(\mathbf A^\top \mathbf A\), and \(\boldsymbol \Lambda=\operatorname{diag}(\lambda_1, \ldots, \lambda_r)\) is a diagonal matrix of eigenvalues with \(\lambda_1\geq \ldots \geq\lambda_r>0\) (by Corollary 3.1).

For \(i=1,\dots, r\), let \(\sigma_i =\sqrt{\lambda_i}\) and let \(\mathbf u_i = \frac{1}{\sigma_i} \mathbf A\mathbf v_i\). Then the vectors \(\mathbf u_i\) are orthonormal: \[\begin{align*} \mathbf u_i^\top \mathbf u_j &=\frac{1}{\sigma_i\sigma_j} \mathbf v_i^\top \mathbf A^\top\mathbf A\mathbf v_j\\ &=\frac{\sigma_j^2}{\sigma_i\sigma_j} \mathbf v_i^\top\mathbf v_j \quad \mbox{ as }\mathbf v_j \mbox{ is an eigenvector of } \mathbf A^\top\mathbf A\\ &=\begin{cases} 0 &\mbox{ if } i\not=j\\ 1 &\mbox{ if } i=j \end{cases}\quad \mbox{ as the } \mathbf v_i \mbox{ are orthonormal vectors.} \end{align*}\] In addition \[\mathbf A^\top\mathbf u_i = \frac{1}{\sigma_i}\mathbf A^\top\mathbf A\mathbf v_i = \frac{\sigma^2_i}{\sigma_i}\mathbf v_i = \sigma_i\mathbf v_i\] and so \(\mathbf u_i\) and \(\mathbf v_i\) are left and right singular vectors.

Let \(\mathbf U=[\mathbf u_1 \; \mathbf u_2 \; \ldots \; \mathbf u_r\; \ldots \; \mathbf u_n]\), where \(\mathbf u_{r+1}, \ldots, \mathbf u_n\) are chosen to complete the orthonormal basis for \(\mathbb{R}^n\) given \(\mathbf u_1, \ldots, \mathbf u_r\), and let \(\boldsymbol{\Sigma}\) be the \(n\times p\) diagonal matrix in Equation (3.1).

Then we have shown that \[\mathbf U= \mathbf A\mathbf V\boldsymbol{\Sigma}^{-1}\] Thus \[\begin{align*} \mathbf U&= \mathbf A\mathbf V\boldsymbol{\Sigma}^{-1}\\ \mathbf U\boldsymbol{\Sigma}&= \mathbf A\mathbf V\\ \mathbf U\boldsymbol{\Sigma}\mathbf V^\top &= \mathbf A. \end{align*}\]


Note that by construction we’ve shown that \(\mathbf A^\top\mathbf A\) has eigenvalues \(\sigma^2_i\) with corresponding eigenvectors \(\mathbf v_i\). We also can also show that \(\mathbf A\mathbf A^\top\) has eigenvalues \(\sigma^2_i\), but with corresponding eigenvectors \(\mathbf u_i\). \[\mathbf A\mathbf A^\top \mathbf u_i = \sigma_i\mathbf A\mathbf v_i = \sigma^2_i \mathbf u_i\]

Proposition 3.6 Let \(\mathbf A\) be any matrix of rank \(r\). Then the non-zero eigenvalues of both \(\mathbf A\mathbf A^\top\) and \(\mathbf A^\top \mathbf A\) are \(\sigma_1^2, \ldots , \sigma_r^2\). The corresponding unit eigenvectors of \(\mathbf A\mathbf A^\top\) are given by the columns of \(\mathbf U\), and the corresponding unit eigenvectors of \(\mathbf A^\top \mathbf A\) are given by the columns of \(\mathbf V\).

Notes:

  1. The SVD expresses a matrix as a sum of rank-1 matrices \[\mathbf A= \sum_{i=1}^r \sigma_i \mathbf u_i \mathbf v_i^\top.\] We can think of these as a list of the building blocks of \(\mathbf A\) ordered by their importance (\(\sigma_1\geq \sigma_2\geq\ldots\)).

  2. The singular value decomposition theorem shows that every matrix is diagonal, provided one uses the proper bases for the domain and range spaces. We can diagonalize \(\mathbf A\) by \[ \mathbf U^\top\mathbf A\mathbf V=\boldsymbol{\Sigma}.\]

  3. The SVD reveals a great deal about a matrix. Firstly, the rank of \(\mathbf A\) is the number of non-zero singular values. The left singular vectors \(\mathbf u_1, \ldots, \mathbf u_r\) are an orthonormal basis for the columns space of \(\mathbf A\), \(\mathcal{C}(\mathbf A)\), and the right singular vectors \(\mathbf v_1, \ldots, \mathbf v_r\) are an orthonormal basis for \(\mathcal{C}(\mathbf A^\top)\), the row space of \(\mathbf A\). The vectors \(\mathbf v_{r+1}, \ldots, \mathbf v_p\) from the non-compact SVD are a basis for the kernel of \(\mathbf A\) (sometimes called the null space \(\mathcal{N}(\mathbf A)\)), and \(\mathbf u_{r+1}, \ldots, \mathbf u_n\) are a basis for \(\mathcal{N}(\mathbf A^\top)\).

  4. The SVD has many uses in mathematics. One is as a generalized inverse of a matrix. If \(\mathbf A\) is \(n \times p\) with \(n\not = p\), or if it is square but not of full rank, then \(\mathbf A\) cannot have an inverse. However, we say \(\mathbf A^+\) is a generalized inverse if \(\mathbf A\mathbf A^+\mathbf A=\mathbf A\). One such generalized inverse can be obtained from the SVD by \(\mathbf A^+ = \mathbf V\boldsymbol{\Sigma}^{-1}\mathbf U^\top\) - this is known as the Moore-Penrose pseudo-inverse.

3.3.1 Examples

In practice, we don’t compute SVDs of a matrix by hand: in R you can use the command SVD(A) to compute the SVD of matrix A. However, it is informative to do the calculation yourself a few times to help fix the ideas.

Example 3.2 Consider the matrix \(\mathbf A= \mathbf x\mathbf y^\top\). We can see this is a rank-1 matrix, so it only has one non-zero singular value which is \(\sigma_1 = ||\mathbf x||.||\mathbf y||\). Its SVD is given by \[\mathbf U= \frac{1}{||\mathbf x|| }\mathbf x,\quad \mathbf V= \frac{1}{||\mathbf y|| }\mathbf y, \quad \mbox{ and } \Sigma = ||\mathbf x||.||\mathbf y||.\]

Example 3.3 Let \[\mathbf A= \left(\begin{array}{ccc}3&2&2\\ 2&3&-2\end{array}\right).\] Let’s try to find the SVD of \(\mathbf A\).

We know the singular values are the square roots of the eigenvalues of \(\mathbf A\mathbf A^\top\) and \(\mathbf A^\top\mathbf A\). We’ll work with the former as it is only \(2\times 2\). \[\mathbf A\mathbf A^\top = \left(\begin{array}{cc}17&8\\ 8&17\end{array}\right) \quad \mbox{ and so } \det(\mathbf A\mathbf A^\top-\lambda \mathbf I)=(17-\lambda)^2-64 \]

Solving \(\det(\mathbf A\mathbf A^\top-\lambda \mathbf I)=0\) gives the eigenvalues to be \(\lambda=25\) or \(9\). Thus the singular values of \(\mathbf A\) are \(\sigma_1=\sqrt{25}=5\) and \(\sigma_2=\sqrt{9}=3\), and \[\boldsymbol{\Sigma}=\left(\begin{array}{cc}5&0\\ 0&3\end{array}\right).\]

The columns of \(\mathbf U\) are the unit eigenvectors of \(\mathbf A\mathbf A^\top\) which we can find by solving \[\begin{align*}(\mathbf A\mathbf A^\top-25\mathbf I_2)\mathbf u&=\left(\begin{array}{cc}-8&8\\ 8&-8\end{array}\right)\left(\begin{array}{c}u_1\\u_2\\\end{array}\right)=\left(\begin{array}{c}0\\0\\\end{array}\right) \quad \mbox{ and }\\ (\mathbf A\mathbf A^\top-9\mathbf I_2)\mathbf u&=\left(\begin{array}{cc}8&8\\ 8&8\end{array}\right)\left(\begin{array}{c}u_1\\u_2\\\end{array}\right)=\left(\begin{array}{c}0\\0\\\end{array}\right).\end{align*}\] And so, remembering that the eigenvectors used to form \(\mathbf U\) need to be unit vectors, we can see that \[\mathbf U=\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1&1\\ 1&-1\end{array}\right).\] Finally, to compute \(\mathbf V\) recall that \(\sigma_i \mathbf v_i = \mathbf A^\top \mathbf u_i\) and so \[\mathbf V= \mathbf A^\top\mathbf U\boldsymbol{\Sigma}^{-1} = \frac{1}{\sqrt{2}}\left(\begin{array}{cc}1&\frac{1}{3}\\ 1&\frac{-1}{3}\\ 0&\frac{4}{3}\end{array}\right). \] This completes the calculation, and we can see that we can express \(\mathbf A\) as \[\mathbf A= \left(\begin{array}{cc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&\frac{-1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{cc}5&0\\ 0&3\end{array}\right)\left(\begin{array}{ccc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}&0\\ \frac{1}{3\sqrt{2}}& \frac{-1}{3 \sqrt{2}} &\frac{4}{3\sqrt{2}}\end{array}\right)\] or as the sum of rank-1 matrices:

\[\mathbf A= 5\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{ccc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}} & 0 \end{array}\right)+ 3\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{-1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{ccc}\frac{1}{3\sqrt{2}}& \frac{-1}{3 \sqrt{2}} &\frac{4}{3\sqrt{2}}\end{array}\right)\]

This is the compact form of the SVD. To find the non-compact form we need \(\mathbf V\) to be a \(3 \times 3\) matrix, which requires us to find a 3rd column that is orthogonal to the first two columns (thus completing an orthonormal basis for \(\mathbb{R}^3\)). We can do that with the vector \(\mathbf v_3 = \frac{1}{3}(2\; -2\; -1)\) giving the non-compact SVD for \(\mathbf A\).

\[\mathbf A= \left(\begin{array}{cc}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&\frac{-1}{\sqrt{2}}\end{array}\right) \left(\begin{array}{ccc}5&0&0\\ 0&3&0\end{array}\right)\left(\begin{array}{ccc}\frac{1}{\sqrt{2}}& \frac{1}{3\sqrt{2}}&\frac{2}{3} \\ \frac{1}{\sqrt{2}}& \frac{-1}{3 \sqrt{2}} &\frac{-2}{3}\\ 0&\frac{4}{3\sqrt{2}}&\frac{-1}{3}\end{array}\right)^\top\]

Let’s check our answer in R.

A<- matrix(c(3,2,2,2,3,-2), nr=2, byrow=T)
svd(A)
## $d
## [1] 5 3
## 
## $u
##            [,1]       [,2]
## [1,] -0.7071068 -0.7071068
## [2,] -0.7071068  0.7071068
## 
## $v
##               [,1]       [,2]
## [1,] -7.071068e-01 -0.2357023
## [2,] -7.071068e-01  0.2357023
## [3,] -5.551115e-17 -0.9428090

The eigenvectors are only defined upto multiplication by \(-1\) and so we can multiply any pair of left and right singular vectors by \(-1\) and it is still a valid SVD.

Note: In practice this is a terrible way to compute the SVD as it is prone to numerical error. In practice an efficient iterative method is used in most software implementations (including R).