2.1 Basics
In this section, we recap some basic definitions and notation. Hopefully this material will largely be familiar to you.
2.1.1 Notation
The matrix \({\mathbf A}\) will be referred to in the following equivalent ways: \[\begin{eqnarray*} {\mathbf A}=\stackrel{n\times p}{\mathbf A} &=& \left(\begin{array}{cccc} a_{11}&a_{12}&\dots&a_{1p}\\ a_{21}&a_{22}&\dots&a_{2p}\\ \vdots&\vdots&&\vdots\\ a_{n1}&a_{n2}&\dots&a_{np} \end{array} \right) \\ &=&[a_{ij}: i=1, \ldots , n; j=1, \ldots , p]\\ &=&(a_{ij})\\ &=& \left( \begin{array}{c}\mathbf a_1^\top\\ \vdots\\ \mathbf a_n^\top\end{array}\right) \end{eqnarray*}\] where the \(a_{ij}\) are the individual entries, and \(\mathbf a_i^\top=(a_{i1}, a_{i2}, \ldots, a_{ip})\) is the \(i^{th}\) row.
A matrix of order \(1\times 1\) is called a scalar.
A matrix of order \(n\times 1\) is called a (column) vector.
A matrix of order \(1\times p\) is called a (row) vector.
e.g. \(\stackrel{n\times 1}{\mathbf a}=\left( \begin{array}{c} a_1\\\vdots\\a_n \end{array} \right)\)is a column vector.
The \(n\times n\) identity matrix \({\mathbf I}_n\) has diagonal elements equal to 1 and off-diagonal elements equal to zero.
A diagonal matrix is an \(n \times n\) matrix whose off-diagonal elements are zero. Sometimes we denote a diagonal matrix by \(\text{diag}\{a_1,\ldots, a_n\}\).
\[\mathbf I_3 = \left(\begin{array}{ccc} 1&0&0\\ 0&1&0\\ 0&0&1\end{array}\right),\quad \text{diag}\{1,2,3\}=\left(\begin{array}{ccc} 1&0&0\\ 0&2&0\\ 0&0&3\end{array}\right)\quad\]
2.1.2 Elementary matrix operations
Addition/Subtraction. If \(\stackrel{n\times p}{\mathbf A}=[a_{ij}]\) and \(\stackrel{n\times p}{\mathbf B}=[b_{ij}]\) are given matrices then \[ {\mathbf A}+{\mathbf B}=[a_{ij}+b_{ij}] \qquad \text{and} \qquad {\mathbf A}-{\mathbf B}=[a_{ij}-b_{ij}].\]
Scalar Multiplication. If \(\lambda\) is a scalar and \({\mathbf A}=[a_{ij}]\) then \[\lambda {\mathbf A}=[\lambda a_{ij}].\]
Matrix Multiplication. If \(\stackrel{n\times p}{\mathbf A}\) and \(\stackrel{p\times q}{\mathbf B}\) are matrices then \(\mathbf A\mathbf B=\stackrel{n\times q}{\mathbf C}=[c_{ij}]\) where \[c_{ij}=\sum _{k=1}^p a_{ik}b_{kj}, \qquad i=1,\dots,n, \qquad j=1,\dots ,q.\]
Matrix Transpose. If \(\stackrel{m \times n}{\mathbf A}=[a_{ij}: i=1, \ldots , m; j=1, \ldots , n]\), then the transpose of \(\mathbf A\), written \(\mathbf A^\top\), is given by the \(n \times m\) matrix \[ \mathbf A^\top =[a_{ji}: j=1, \ldots , n; i=1, \ldots, m]. \] Note from the definitions that \((\mathbf A\mathbf B)^\top={\mathbf B}^\top {\mathbf A}^\top\).
Matrix Inverse. The inverse of a matrix \(\stackrel{n\times n}{\mathbf A}\) (if it exists) is a matrix \(\stackrel{n\times n}{\mathbf B}\) such that \({\mathbf A}\mathbf B=\mathbf B\mathbf A={\mathbf I}_n.\) We denote the inverse by \({\mathbf A}^{-1}\). Note that if \({\mathbf A}_1\) and \({\mathbf A}_2\) are both invertible, then \(({\mathbf A}_1 {\mathbf A}_2)^{-1}={\mathbf A}_2^{-1}{\mathbf A}_1^{-1}\).
Trace. The trace of a matrix \(\stackrel{n\times n}{\mathbf A}\) is given by \[ \text{tr}({\mathbf A})=\sum _{i=1}^n a_{ii}.\]
- The determinant of a square matrix \(\stackrel{n\times n}{\mathbf A}\) is defined as \[ \text{det}({\mathbf A})=\sum (-1)^{|\tau |} a_{1\tau(1)}\dots a_{n\tau (n)} \] where the summation is taken over all permutations \(\tau\) of \(\{1,2,\dots ,n\}\), and we define \(|\tau |=0\) or \(1\) depending on whether \(\tau\) can be written as an even or odd number of transpositions.
E.g. If \({\mathbf A}=\left[ \begin{array}{cc} a_{11}&a_{12}\\ a_{21}&a_{22} \end{array} \right]\), then \(\text{det}({\mathbf A})=a_{11}a_{22}-a_{12}a_{21}\).
2.1.3 Special matrices
Definition 2.1 An \(n\times n\) matrix \(\mathbf A\) is symmetric if \[\mathbf A= \mathbf A^\top.\] An \(n\times n\) symmetric matrix \(\mathbf A\) is positive-definite if \[\mathbf x^\top \mathbf A\mathbf x>0 \mbox{ for all } \mathbf x\in \mathbb{R}^n, \mathbf x\not = \boldsymbol 0\] and is positive semi-definite if \[\mathbf x^\top \mathbf A\mathbf x\geq 0 \mbox{ for all } \mathbf x\in \mathbb{R}^n.\]
\(\mathbf A\) is idempotent if \(\mathbf A^2=\mathbf A\).2.1.4 Vector Differentiation
Consider a real-valued function \(f: \mathbb{R}^p \rightarrow \mathbb{R}\) of a vector variable \(\mathbf x=(x_1, \ldots , x_p)^\top\). Sometimes we will want to differentiate \(f\). We define the partial derivative of \(f(\mathbf x)\) with respect to \(\mathbf x\) to be the vector of partial derivatives, i.e. \[\begin{equation} \frac{\partial f}{\partial \mathbf x}(\mathbf x)=\left [ \begin{array}{c} \frac{\partial f}{\partial x_1}(\mathbf x)\\ ..\\ ..\\ ..\\ \frac{\partial f}{\partial x_p}(\mathbf x) \end{array} \right ] \tag{2.1} \end{equation}\] The following examples can be worked out directly from the definition (2.1), using the chain rule in some cases.
The nature of a stationary point is determined by the Hessian
If the Hessian is positive (negative) definite at a stationary point \(\mathbf x\), then the stationary point is a minimum (maximum).
If the Hessian has both positive and negative eigenvalues at \(\mathbf x\) then the stationary point will be a saddle point.