3.2 Spectral/eigen decomposition

3.2.1 Eigenvalues and eigenvectors

Consider the \(n\times n\) matrix \(\mathbf A\). We say that vector \(\mathbf x\in \mathbb{R}^n\) is an eigenvector corresponding to eigenvalue \(\lambda\) of \(\mathbf A\) if \[\mathbf A\mathbf x= \lambda \mathbf x.\]

To find the eigenvalues of a matrix, we note that if \(\lambda\) is an eigenvalue, then \((\mathbf A-\lambda \mathbf I_n)\mathbf x=\boldsymbol 0\), i.e., the kernel of \(\mathbf A-\lambda \mathbf I_n\) has dimension at least 1, so \(\mathbf A-\lambda \mathbf I_n\) is not invertible, and so we must have \(\text{det}(\mathbf A-\lambda \mathbf I_n)=0\).

Let \(R(\lambda )=\text{det}(\mathbf A-\lambda \mathbf I_n)\), which is an \(n^{\text{th}}\) order polynomial in \(\lambda\). To find the eigenvalues of \(\mathbf A\) we find the \(n\) roots \(\lambda _1, \dots , \lambda _n\) of \(R(\lambda )\). We will always consider ordered eigenvalues so that \(\lambda _1\geq \dots \geq \lambda _n\)

Proposition 3.1 If \(\mathbf A\) is symmetric (i.e. \(\mathbf A^\top =\mathbf A\)) then the eigenvalues and eigenvectors of \(\mathbf A\) are real (in \(\mathbb{R}\)).

Proposition 3.2 If \(\mathbf A\) is an \(n \times n\) symmetric matrix then its determinant is the product of its eigenvalues, i.e. \(\text{det}(\mathbf A)=\lambda _1 \dots \lambda _n\).

Thus, \[\mathbf A\mbox{ is invertible } \iff \det(\mathbf A)\not=0 \iff \lambda_i\not=0 \forall i \iff \mathbf A\mbox{ is of full rank}\]

3.2.2 Spectral decomposition

The key to much of dimension reduction is finding matrix decompositions. The first decomposition we will consider is the spectral decomposition (also called an eigen-decomposition).

Proposition 3.3 (Spectral decomposition). Any \(n\times n\) symmetric matrix \(\mathbf A\) can be written as \[\mathbf A=\mathbf Q\boldsymbol \Lambda\mathbf Q^\top = \sum _{i=1}^{n} \lambda _i \mathbf q_i \mathbf q_i^\top ,\] where \(\boldsymbol \Lambda=\text{diag}\{ \lambda _1, \dots , \lambda _n \}\) is an \(n \times n\) diagonal matrix consisting of the eigenvalues of \(\mathbf A\) and \(\mathbf Q\) is an orthogonal matrix (\(\mathbf Q\mathbf Q^\top=\mathbf Q^\top \mathbf Q=\mathbf I_n\)) whose columns are unit eigenvectors \(\mathbf q_1, \dots , \mathbf q_n\) of \(\mathbf A\).

Because \(\boldsymbol \Lambda\) is a diagonal matrix, we sometimes refer to the spectral decomposition as diagonalizing the matrix \(\mathbf A\) as \(\mathbf Q^\top\mathbf A\mathbf Q=\boldsymbol \Lambda\) is a diagonal matrix.

This will be useful at various points throughout the module. Note that it relies upon the fact that the eigenvectors of \(\mathbf A\) can be chosen to be mutually orthogonal, and as there are \(n\) of them, they form an orthonormal basis for \(\mathbb{R}^n\).

Corollary 3.1 The rank of a symmetric matrix is equal to the number of non-zero eigenvalues (counting according to their multiplicities).

Proof. If \(r\) is the number of non-zero eigenvalues of \(\mathbf A\), then we have (after possibly reordering the \(\lambda_i\)) \[\mathbf A= \sum _{i=1}^{r} \lambda _i \mathbf q_i \mathbf q_i^\top. \] Each \(\mathbf q_i \mathbf q_i^\top\) is a rank 1 matrix, with column space equal to the span of \(\mathbf q_i\). As the \(\mathbf q_i\) are orthogonal, the columns spaces \(\mathcal{C}(\mathbf q_i \mathbf q_i^\top)\) are orthogonal, and their union is a vector space of dimension \(r\). Hence the rank of \(\mathbf A\) is \(r\).

Lemma 3.1 Let \(\mathbf A\) be an \(n \times n\) symmetric matrix with (necessarily real) eigenvalues \(\lambda _1 \geq \lambda _2 \geq \dots \geq \lambda _n\). Then \(\mathbf A\) is positive definite if and only if \(\lambda _n >0\). It is positive semi-definite if and only if \(\lambda_n \geq 0\).

Proof. If \(\mathbf A\) is positive definite, and if \(\mathbf x\) is a unit-eigenvector of \(\mathbf A\) corresponding to \(\lambda_n\), then \[0<\mathbf x^\top \mathbf A\mathbf x= \lambda_n \mathbf x^\top \mathbf x= \lambda_n.\]

Conversely, suppose \(\mathbf A\) has positive eigenvalues. Because \(\mathbf A\) is real and symmetric, we can write it as \(\mathbf A=\mathbf Q\boldsymbol \Lambda\mathbf Q^\top\). Now if \(\mathbf x\) is a non-zero vector, then \(\mathbf y= \mathbf Q^\top \mathbf x\not= \boldsymbol 0\), (as \(\mathbf Q^\top\) has inverse \(\mathbf Q\) and hence \(\dim \operatorname{Ker}(\mathbf Q)=0\)). Thus \[\mathbf x^\top \mathbf A\mathbf x= \mathbf y^\top \boldsymbol \Lambda\mathbf y= \sum_{i=1}^n \lambda_i y_i^2 >0\] and thus \(\mathbf A\) is positive definite.

Note: A covariance matrix \(\boldsymbol{\Sigma}\) is always positive semi- definite (and thus always has non-negative eigenvalues). To see this, recall that if \(\mathbf x\) is a random vector with \({\mathbb{V}\operatorname{ar}}(\mathbf x)=\boldsymbol{\Sigma}\), then for any constant vector \(\mathbf a\), the random variable \(\mathbf a^\top \mathbf x\) has variance \({\mathbb{V}\operatorname{ar}}(\mathbf a^\top \mathbf x)=\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\). Because variances are positive, we must have \[\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\geq 0 \;\forall \;\mathbf a.\]

Moreover, if \(\boldsymbol{\Sigma}\) is positive definite (so that its eigenvalues are positive), then its determinant will be positive (so that \(\boldsymbol{\Sigma}\) is non-singular) and we can find an inverse \(\boldsymbol{\Sigma}^{-1}\) matrix, which is called the precision matrix.

Proposition 3.4 The eigenvalues of a projection matrix \(\mathbf P\) are all \(0\) or \(1\).

3.2.3 Matrix square roots

From the spectral decomposition theorem, we can see that if \(\mathbf A\) is a symmetric positive semi-definite matrix, then for any integer \(p\)

\[\mathbf A^p = \mathbf Q\boldsymbol \Lambda^p \mathbf Q^\top.\] If in addition \(\mathbf A\) is positive definite (rather than just semi-definite), then \[\mathbf A^{-1} = \mathbf Q\boldsymbol \Lambda^{-1} \mathbf Q^\top\] where \(\boldsymbol \Lambda^{-1} = \text{diag}\{ \frac{1}{\lambda _1}, \dots , \frac{1}{\lambda _n} \}\).

The spectral decomposition also gives us a way to define a matrix square root. If we assume \(\mathbf A\) is positive semi-definite, then its eigenvalues are non-negative, and the diagonal elements of \(\boldsymbol \Lambda\) are all non-negative.

We then define \({\mathbf A}^{1/2}\), a matrix square root of \(\mathbf A\), to be \({\mathbf A}^{1/2}=\mathbf Q\boldsymbol \Lambda^{1/2} \mathbf Q^\top\) where \(\boldsymbol \Lambda^{1/2}=\text{diag}\{\lambda_1^{1/2}, \ldots , \lambda_n^{1/2}\}\). This definition makes sense because \[\begin{align*} \mathbf A^{1/2}\mathbf A^{1/2}&=\mathbf Q\boldsymbol \Lambda^{1/2}\mathbf Q^\top \mathbf Q\boldsymbol \Lambda^{1/2} \mathbf Q^\top\\ &=\mathbf Q\boldsymbol \Lambda^{1/2}\boldsymbol \Lambda^{1/2}\mathbf Q^\top\\ &=\mathbf Q\boldsymbol \Lambda\mathbf Q^\top\\ &=\mathbf A, \end{align*}\] where \(\mathbf Q^\top \mathbf Q=\mathbf I_n\) and \(\boldsymbol \Lambda^{1/2}\boldsymbol \Lambda^{1/2}=\boldsymbol \Lambda\). The matrix \(\mathbf A^{1/2}\) is not the only matrix square root of \(\mathbf A\), but it is the only symmetric, positive semi-definite square root of \(\mathbf A\).

If \(\mathbf A\) is positive definite (as opposed to just positive semi-definite), then all the \(\lambda_i\) are positive and so we can also define \(\mathbf A^{-1/2}=\mathbf Q\boldsymbol \Lambda^{-1/2}\mathbf Q^\top\) where \(\boldsymbol \Lambda^{-1/2}=\text{diag}\{\lambda_1^{-1/2},\ldots , \lambda_n^{-1/2}\}\). Note that \[ \mathbf A^{-1/2}\mathbf A^{-1/2}=\mathbf Q\boldsymbol \Lambda^{-1/2}\mathbf Q^\top \mathbf Q\boldsymbol \Lambda^{-1/2}\mathbf Q^\top =\mathbf Q\boldsymbol \Lambda^{-1}\mathbf Q^\top=\mathbf A^{-1}, \] so that, as defined above, \(\mathbf A^{-1/2}\) is the matrix square root of \(\mathbf A^{-1}\). Furthermore, similar calculations show that \[ \mathbf A^{1/2}\mathbf A^{-1/2}=\mathbf A^{-1/2}\mathbf A^{1/2}=\mathbf I_n, \] so that \(\mathbf A^{-1/2}\) is the matrix inverse of \(\mathbf A^{1/2}\).