3.2 Spectral/eigen decomposition
3.2.1 Eigenvalues and eigenvectors
Consider the \(n\times n\) matrix \(\mathbf A\). We say that vector \(\mathbf x\in \mathbb{R}^n\) is an eigenvector corresponding to eigenvalue \(\lambda\) of \(\mathbf A\) if \[\mathbf A\mathbf x= \lambda \mathbf x.\]
To find the eigenvalues of a matrix, we note that if \(\lambda\) is an eigenvalue, then \((\mathbf A-\lambda \mathbf I_n)\mathbf x=\boldsymbol 0\), i.e., the kernel of \(\mathbf A-\lambda \mathbf I_n\) has dimension at least 1, so \(\mathbf A-\lambda \mathbf I_n\) is not invertible, and so we must have \(\text{det}(\mathbf A-\lambda \mathbf I_n)=0\).
Let \(R(\lambda )=\text{det}(\mathbf A-\lambda \mathbf I_n)\), which is an \(n^{\text{th}}\) order polynomial in \(\lambda\). To find the eigenvalues of \(\mathbf A\) we find the \(n\) roots \(\lambda _1, \dots , \lambda _n\) of \(R(\lambda )\). We will always consider ordered eigenvalues so that \(\lambda _1\geq \dots \geq \lambda _n\)
Thus, \[\mathbf A\mbox{ is invertible } \iff \det(\mathbf A)\not=0 \iff \lambda_i\not=0 \forall i \iff \mathbf A\mbox{ is of full rank}\]
3.2.2 Spectral decomposition
The key to much of dimension reduction is finding matrix decompositions. The first decomposition we will consider is the spectral decomposition (also called an eigen-decomposition).
Because \(\boldsymbol \Lambda\) is a diagonal matrix, we sometimes refer to the spectral decomposition as diagonalizing the matrix \(\mathbf A\) as \(\mathbf Q^\top\mathbf A\mathbf Q=\boldsymbol \Lambda\) is a diagonal matrix.
This will be useful at various points throughout the module. Note that it relies upon the fact that the eigenvectors of \(\mathbf A\) can be chosen to be mutually orthogonal, and as there are \(n\) of them, they form an orthonormal basis for \(\mathbb{R}^n\).
Proof. If \(\mathbf A\) is positive definite, and if \(\mathbf x\) is a unit-eigenvector of \(\mathbf A\) corresponding to \(\lambda_n\), then \[0<\mathbf x^\top \mathbf A\mathbf x= \lambda_n \mathbf x^\top \mathbf x= \lambda_n.\]
Conversely, suppose \(\mathbf A\) has positive eigenvalues. Because \(\mathbf A\) is real and symmetric, we can write it as \(\mathbf A=\mathbf Q\boldsymbol \Lambda\mathbf Q^\top\). Now if \(\mathbf x\) is a non-zero vector, then \(\mathbf y= \mathbf Q^\top \mathbf x\not= \boldsymbol 0\), (as \(\mathbf Q^\top\) has inverse \(\mathbf Q\) and hence \(\dim \operatorname{Ker}(\mathbf Q)=0\)). Thus \[\mathbf x^\top \mathbf A\mathbf x= \mathbf y^\top \boldsymbol \Lambda\mathbf y= \sum_{i=1}^n \lambda_i y_i^2 >0\] and thus \(\mathbf A\) is positive definite.Note: A covariance matrix \(\boldsymbol{\Sigma}\) is always positive semi- definite (and thus always has non-negative eigenvalues). To see this, recall that if \(\mathbf x\) is a random vector with \({\mathbb{V}\operatorname{ar}}(\mathbf x)=\boldsymbol{\Sigma}\), then for any constant vector \(\mathbf a\), the random variable \(\mathbf a^\top \mathbf x\) has variance \({\mathbb{V}\operatorname{ar}}(\mathbf a^\top \mathbf x)=\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\). Because variances are positive, we must have \[\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\geq 0 \;\forall \;\mathbf a.\]
Moreover, if \(\boldsymbol{\Sigma}\) is positive definite (so that its eigenvalues are positive), then its determinant will be positive (so that \(\boldsymbol{\Sigma}\) is non-singular) and we can find an inverse \(\boldsymbol{\Sigma}^{-1}\) matrix, which is called the precision matrix.
3.2.3 Matrix square roots
From the spectral decomposition theorem, we can see that if \(\mathbf A\) is a symmetric positive semi-definite matrix, then for any integer \(p\)
\[\mathbf A^p = \mathbf Q\boldsymbol \Lambda^p \mathbf Q^\top.\] If in addition \(\mathbf A\) is positive definite (rather than just semi-definite), then \[\mathbf A^{-1} = \mathbf Q\boldsymbol \Lambda^{-1} \mathbf Q^\top\] where \(\boldsymbol \Lambda^{-1} = \text{diag}\{ \frac{1}{\lambda _1}, \dots , \frac{1}{\lambda _n} \}\).
The spectral decomposition also gives us a way to define a matrix square root. If we assume \(\mathbf A\) is positive semi-definite, then its eigenvalues are non-negative, and the diagonal elements of \(\boldsymbol \Lambda\) are all non-negative.
We then define \({\mathbf A}^{1/2}\), a matrix square root of \(\mathbf A\), to be \({\mathbf A}^{1/2}=\mathbf Q\boldsymbol \Lambda^{1/2} \mathbf Q^\top\) where \(\boldsymbol \Lambda^{1/2}=\text{diag}\{\lambda_1^{1/2}, \ldots , \lambda_n^{1/2}\}\). This definition makes sense because \[\begin{align*} \mathbf A^{1/2}\mathbf A^{1/2}&=\mathbf Q\boldsymbol \Lambda^{1/2}\mathbf Q^\top \mathbf Q\boldsymbol \Lambda^{1/2} \mathbf Q^\top\\ &=\mathbf Q\boldsymbol \Lambda^{1/2}\boldsymbol \Lambda^{1/2}\mathbf Q^\top\\ &=\mathbf Q\boldsymbol \Lambda\mathbf Q^\top\\ &=\mathbf A, \end{align*}\] where \(\mathbf Q^\top \mathbf Q=\mathbf I_n\) and \(\boldsymbol \Lambda^{1/2}\boldsymbol \Lambda^{1/2}=\boldsymbol \Lambda\). The matrix \(\mathbf A^{1/2}\) is not the only matrix square root of \(\mathbf A\), but it is the only symmetric, positive semi-definite square root of \(\mathbf A\).
If \(\mathbf A\) is positive definite (as opposed to just positive semi-definite), then all the \(\lambda_i\) are positive and so we can also define \(\mathbf A^{-1/2}=\mathbf Q\boldsymbol \Lambda^{-1/2}\mathbf Q^\top\) where \(\boldsymbol \Lambda^{-1/2}=\text{diag}\{\lambda_1^{-1/2},\ldots , \lambda_n^{-1/2}\}\). Note that \[ \mathbf A^{-1/2}\mathbf A^{-1/2}=\mathbf Q\boldsymbol \Lambda^{-1/2}\mathbf Q^\top \mathbf Q\boldsymbol \Lambda^{-1/2}\mathbf Q^\top =\mathbf Q\boldsymbol \Lambda^{-1}\mathbf Q^\top=\mathbf A^{-1}, \] so that, as defined above, \(\mathbf A^{-1/2}\) is the matrix square root of \(\mathbf A^{-1}\). Furthermore, similar calculations show that \[ \mathbf A^{1/2}\mathbf A^{-1/2}=\mathbf A^{-1/2}\mathbf A^{1/2}=\mathbf I_n, \] so that \(\mathbf A^{-1/2}\) is the matrix inverse of \(\mathbf A^{1/2}\).