2.1 Basics
In this section, we recap some basic definitions and notation. Hopefully this material will largely be familiar to you.
2.1.1 Notation
The matrix \({\mathbf A}\) will be referred to in the following equivalent ways: \[\begin{eqnarray*} {\mathbf A}=\stackrel{n\times p}{\mathbf A} &=& \left(\begin{array}{cccc} a_{11}&a_{12}&\dots&a_{1p}\\ a_{21}&a_{22}&\dots&a_{2p}\\ \vdots&\vdots&&\vdots\\ a_{n1}&a_{n2}&\dots&a_{np} \end{array} \right) \\ &=&[a_{ij}: i=1, \ldots , n; j=1, \ldots , p]\\ &=&(a_{ij})\\ &=& \left( \begin{array}{c}\mathbf a_1^\top\\ \vdots\\ \mathbf a_n^\top\end{array}\right) \end{eqnarray*}\] where the \(a_{ij}\) are the individual entries, and \(\mathbf a_i^\top=(a_{i1}, a_{i2}, \ldots, a_{ip})\) is the \(i^{th}\) row.
A matrix of order \(1\times 1\) is called a scalar.
A matrix of order \(n\times 1\) is called a (column) vector.
A matrix of order \(1\times p\) is called a (row) vector.
e.g. \(\stackrel{n\times 1}{\mathbf a}=\left( \begin{array}{c} a_1\\\vdots\\a_n \end{array} \right)\)is a column vector.
The \(n\times n\) identity matrix \({\mathbf I}_n\) has diagonal elements equal to 1 and off-diagonal elements equal to zero.
A diagonal matrix is an \(n \times n\) matrix whose off-diagonal elements are zero. Sometimes we denote a diagonal matrix by \(\text{diag}\{a_1,\ldots, a_n\}\).
\[\mathbf I_3 = \left(\begin{array}{ccc} 1&0&0\\ 0&1&0\\ 0&0&1\end{array}\right),\quad \text{diag}\{1,2,3\}=\left(\begin{array}{ccc} 1&0&0\\ 0&2&0\\ 0&0&3\end{array}\right)\quad\]
2.1.2 Elementary matrix operations
Addition/Subtraction. If \(\stackrel{n\times p}{\mathbf A}=[a_{ij}]\) and \(\stackrel{n\times p}{\mathbf B}=[b_{ij}]\) are given matrices then \[ {\mathbf A}+{\mathbf B}=[a_{ij}+b_{ij}] \qquad \text{and} \qquad {\mathbf A}-{\mathbf B}=[a_{ij}-b_{ij}].\]
Scalar Multiplication. If \(\lambda\) is a scalar and \({\mathbf A}=[a_{ij}]\) then \[\lambda {\mathbf A}=[\lambda a_{ij}].\]
Matrix Multiplication. If \(\stackrel{n\times p}{\mathbf A}\) and \(\stackrel{p\times q}{\mathbf B}\) are matrices then \(\mathbf A\mathbf B=\stackrel{n\times q}{\mathbf C}=[c_{ij}]\) where \[c_{ij}=\sum _{k=1}^p a_{ik}b_{kj}, \qquad i=1,\dots,n, \qquad j=1,\dots ,q.\]
Matrix Transpose. If \(\stackrel{m \times n}{\mathbf A}=[a_{ij}: i=1, \ldots , m; j=1, \ldots , n]\), then the transpose of \(\mathbf A\), written \(\mathbf A^\top\), is given by the \(n \times m\) matrix \[ \mathbf A^\top =[a_{ji}: j=1, \ldots , n; i=1, \ldots, m]. \] Note from the definitions that \((\mathbf A\mathbf B)^\top={\mathbf B}^\top {\mathbf A}^\top\).
Matrix Inverse. The inverse of a matrix \(\stackrel{n\times n}{\mathbf A}\) (if it exists) is a matrix \(\stackrel{n\times n}{\mathbf B}\) such that \({\mathbf A}\mathbf B=\mathbf B\mathbf A={\mathbf I}_n.\) We denote the inverse by \({\mathbf A}^{-1}\). Note that if \({\mathbf A}_1\) and \({\mathbf A}_2\) are both invertible, then \(({\mathbf A}_1 {\mathbf A}_2)^{-1}={\mathbf A}_2^{-1}{\mathbf A}_1^{-1}\).
Trace. The trace of a matrix \(\stackrel{n\times n}{\mathbf A}\) is given by \[ \text{tr}({\mathbf A})=\sum _{i=1}^n a_{ii}.\]
Lemma 2.1 For any matrices \(\mathbf A\) (\(n \times m\)) and \(\mathbf B\) (\(m \times n\)), \[ \text{tr}(\mathbf A\mathbf B) = \text{tr}(\mathbf B\mathbf A). \]
- The determinant of a square matrix \(\stackrel{n\times n}{\mathbf A}\) is defined as \[ \text{det}({\mathbf A})=\sum (-1)^{|\tau |} a_{1\tau(1)}\dots a_{n\tau (n)} \] where the summation is taken over all permutations \(\tau\) of \(\{1,2,\dots ,n\}\), and we define \(|\tau |=0\) or \(1\) depending on whether \(\tau\) can be written as an even or odd number of transpositions.
E.g. If \({\mathbf A}=\left[ \begin{array}{cc} a_{11}&a_{12}\\ a_{21}&a_{22} \end{array} \right]\), then \(\text{det}({\mathbf A})=a_{11}a_{22}-a_{12}a_{21}\).
Proposition 2.1 Matrix \(\stackrel{n\times n}{\mathbf A}\) is invertible if and only if \(\det(\mathbf A)\not = 0\). If \(\mathbf A^{-1}\) exists then \[\det(\mathbf A)=\frac{1}{\det(\mathbf A^{-1})}\]
Proposition 2.2 For any matrices \(\stackrel{n\times n}{\mathbf A}\), \(\stackrel{n\times n}{\mathbf B}\), \(\stackrel{n\times n}{\mathbf C}\) such that \({\mathbf C}={\mathbf{AB}}\), \[ \text{det}({\mathbf C})=\text{det}({\mathbf A}) \cdot \text{det}({\mathbf B}).\]
2.1.3 Special matrices
Definition 2.1 An \(n\times n\) matrix \(\mathbf A\) is symmetric if \[\mathbf A= \mathbf A^\top.\] An \(n\times n\) symmetric matrix \(\mathbf A\) is positive-definite if \[\mathbf x^\top \mathbf A\mathbf x>0 \mbox{ for all } \mathbf x\in \mathbb{R}^n, \mathbf x\not = {\boldsymbol 0}\] and is positive semi-definite if \[\mathbf x^\top \mathbf A\mathbf x\geq 0 \mbox{ for all } \mathbf x\in \mathbb{R}^n.\]
\(\mathbf A\) is idempotent if \(\mathbf A^2=\mathbf A\).
2.1.4 Vector Differentiation
Consider a real-valued function \(f: \mathbb{R}^p \rightarrow \mathbb{R}\) of a vector variable \(\mathbf x=(x_1, \ldots , x_p)^\top\). Sometimes we will want to differentiate \(f\). We define the partial derivative of \(f(\mathbf x)\) with respect to \(\mathbf x\) to be the vector of partial derivatives, i.e. \[\begin{equation} \frac{\partial f}{\partial \mathbf x}(\mathbf x)=\left [ \begin{array}{c} \frac{\partial f}{\partial x_1}(\mathbf x)\\ ..\\ ..\\ ..\\ \frac{\partial f}{\partial x_p}(\mathbf x) \end{array} \right ] \tag{2.1} \end{equation}\] The following examples can be worked out directly from the definition (2.1), using the chain rule in some cases.
Example 2.1 If \(f(\mathbf x)=\mathbf a^\top \mathbf x\) where \(\mathbf a\in \mathbb{R}^p\) is a constant vector, then \[ \frac{\partial f}{\partial \mathbf x}(\mathbf x)=\mathbf a. \]
Example 2.2 If \(f(\mathbf x)=(\mathbf x-\mathbf a)^\top \mathbf A(\mathbf x-\mathbf a)\) for a fixed vector \(\mathbf a\in \mathbb{R}^p\) and \(\mathbf A\) is a symmetric constant \(p \times p\) matrix, then \[ \frac{\partial f}{\partial \mathbf x}(\mathbf x)=2\mathbf A(\mathbf x-\mathbf a). \]
Example 2.3 Suppose that \(g: \, \mathbb{R} \rightarrow \mathbb{R}\) is a differentiable function with derivative \(g^\prime\). Then, using the chain rule for partial derivatives, \[ \frac{\partial g(\mathbf a^\top \mathbf x)}{\partial \mathbf x}=g^{\prime}(\mathbf a^\top\mathbf x)\frac{\partial}{\partial \mathbf x}\left \{\mathbf a^\top \mathbf x\right \}=g^{\prime}(\mathbf a^\top\mathbf x) \mathbf a. \]
Example 2.4 If \(f\) is defined as in Example 2.2 and \(g\) is as in Example 2.3 then, using the chain rule again, \[ \frac{\partial }{\partial \mathbf x} g\{f(\mathbf x)\}=g^{\prime} \{f(\mathbf x)\}\frac{\partial f}{\partial \mathbf x}(\mathbf x) =2 g^{\prime}\{(\mathbf x- \mathbf a)^\top \mathbf A(\mathbf x- \mathbf a)\}\mathbf A(\mathbf x-\mathbf a). \]
If we wish to find a maximum or minimum of \(f(\mathbf x)\) we should search for stationary points of \(f\), i.e. solutions to the system of equations \[ \frac{\partial f}{\partial \mathbf x}(\mathbf x)\equiv \left [ \begin{array}{c} \frac{\partial f}{\partial x_1}(\mathbf x)\\ ..\\ ..\\ ..\\ \frac{\partial f}{\partial x_p}(\mathbf x) \end{array} \right ]={\mathbf 0}_p. \] ::: {.definition #hessian} The Hessian matrix of \(f\) is the \(p \times p\) matrix of second derivatives. \[ \frac{\partial^2f}{\partial \mathbf x\partial \mathbf x^\top}(\mathbf x) =\left \{ \frac{\partial^2 f(\mathbf x)}{\partial x_j \partial x_k}\right \}_{j,k=1}^p. \] :::
The nature of a stationary point is determined by the Hessian
If the Hessian is positive (negative) definite at a stationary point \(\mathbf x\), then the stationary point is a minimum (maximum).
If the Hessian has both positive and negative eigenvalues at \(\mathbf x\) then the stationary point will be a saddle point.