2.1 Basics

In this section, we recap some basic definitions and notation. Hopefully this material will largely be familiar to you.

2.1.1 Notation

The matrix {\mathbf A} will be referred to in the following equivalent ways: \begin{eqnarray*} {\mathbf A}=\stackrel{n\times p}{\mathbf A} &=& \left(\begin{array}{cccc} a_{11}&a_{12}&\dots&a_{1p}\\ a_{21}&a_{22}&\dots&a_{2p}\\ \vdots&\vdots&&\vdots\\ a_{n1}&a_{n2}&\dots&a_{np} \end{array} \right) \\ &=&[a_{ij}: i=1, \ldots , n; j=1, \ldots , p]\\ &=&(a_{ij})\\ &=& \left( \begin{array}{c}\mathbf a_1^\top\\ \vdots\\ \mathbf a_n^\top\end{array}\right) \end{eqnarray*} where the a_{ij} are the individual entries, and \mathbf a_i^\top=(a_{i1}, a_{i2}, \ldots, a_{ip}) is the i^{th} row.

A matrix of order 1\times 1 is called a scalar.

A matrix of order n\times 1 is called a (column) vector.

A matrix of order 1\times p is called a (row) vector.

e.g. \stackrel{n\times 1}{\mathbf a}=\left( \begin{array}{c} a_1\\\vdots\\a_n \end{array} \right)is a column vector.

The n\times n identity matrix {\mathbf I}_n has diagonal elements equal to 1 and off-diagonal elements equal to zero.

A diagonal matrix is an n \times n matrix whose off-diagonal elements are zero. Sometimes we denote a diagonal matrix by \text{diag}\{a_1,\ldots, a_n\}.

\mathbf I_3 = \left(\begin{array}{ccc} 1&0&0\\ 0&1&0\\ 0&0&1\end{array}\right),\quad \text{diag}\{1,2,3\}=\left(\begin{array}{ccc} 1&0&0\\ 0&2&0\\ 0&0&3\end{array}\right)\quad

2.1.2 Elementary matrix operations

  1. Addition/Subtraction. If \stackrel{n\times p}{\mathbf A}=[a_{ij}] and \stackrel{n\times p}{\mathbf B}=[b_{ij}] are given matrices then {\mathbf A}+{\mathbf B}=[a_{ij}+b_{ij}] \qquad \text{and} \qquad {\mathbf A}-{\mathbf B}=[a_{ij}-b_{ij}].

  2. Scalar Multiplication. If \lambda is a scalar and {\mathbf A}=[a_{ij}] then \lambda {\mathbf A}=[\lambda a_{ij}].

  3. Matrix Multiplication. If \stackrel{n\times p}{\mathbf A} and \stackrel{p\times q}{\mathbf B} are matrices then \mathbf A\mathbf B=\stackrel{n\times q}{\mathbf C}=[c_{ij}] where c_{ij}=\sum _{k=1}^p a_{ik}b_{kj}, \qquad i=1,\dots,n, \qquad j=1,\dots ,q.

  4. Matrix Transpose. If \stackrel{m \times n}{\mathbf A}=[a_{ij}: i=1, \ldots , m; j=1, \ldots , n], then the transpose of \mathbf A, written \mathbf A^\top, is given by the n \times m matrix \mathbf A^\top =[a_{ji}: j=1, \ldots , n; i=1, \ldots, m]. Note from the definitions that (\mathbf A\mathbf B)^\top={\mathbf B}^\top {\mathbf A}^\top.

  5. Matrix Inverse. The inverse of a matrix \stackrel{n\times n}{\mathbf A} (if it exists) is a matrix \stackrel{n\times n}{\mathbf B} such that {\mathbf A}\mathbf B=\mathbf B\mathbf A={\mathbf I}_n. We denote the inverse by {\mathbf A}^{-1}. Note that if {\mathbf A}_1 and {\mathbf A}_2 are both invertible, then ({\mathbf A}_1 {\mathbf A}_2)^{-1}={\mathbf A}_2^{-1}{\mathbf A}_1^{-1}.

  6. Trace. The trace of a matrix \stackrel{n\times n}{\mathbf A} is given by \text{tr}({\mathbf A})=\sum _{i=1}^n a_{ii}.

Lemma 2.1 For any matrices \mathbf A (n \times m) and \mathbf B (m \times n), \text{tr}(\mathbf A\mathbf B) = \text{tr}(\mathbf B\mathbf A).

  1. The determinant of a square matrix \stackrel{n\times n}{\mathbf A} is defined as \text{det}({\mathbf A})=\sum (-1)^{|\tau |} a_{1\tau(1)}\dots a_{n\tau (n)} where the summation is taken over all permutations \tau of \{1,2,\dots ,n\}, and we define |\tau |=0 or 1 depending on whether \tau can be written as an even or odd number of transpositions.

E.g. If {\mathbf A}=\left[ \begin{array}{cc} a_{11}&a_{12}\\ a_{21}&a_{22} \end{array} \right], then \text{det}({\mathbf A})=a_{11}a_{22}-a_{12}a_{21}.

Proposition 2.1 Matrix \stackrel{n\times n}{\mathbf A} is invertible if and only if \det(\mathbf A)\not = 0. If \mathbf A^{-1} exists then \det(\mathbf A)=\frac{1}{\det(\mathbf A^{-1})}

Proposition 2.2 For any matrices \stackrel{n\times n}{\mathbf A}, \stackrel{n\times n}{\mathbf B}, \stackrel{n\times n}{\mathbf C} such that {\mathbf C}={\mathbf{AB}}, \text{det}({\mathbf C})=\text{det}({\mathbf A}) \cdot \text{det}({\mathbf B}).

2.1.3 Special matrices

Definition 2.1 An n\times n matrix \mathbf A is symmetric if \mathbf A= \mathbf A^\top. An n\times n symmetric matrix \mathbf A is positive-definite if \mathbf x^\top \mathbf A\mathbf x>0 \mbox{ for all } \mathbf x\in \mathbb{R}^n, \mathbf x\not = {\boldsymbol 0} and is positive semi-definite if \mathbf x^\top \mathbf A\mathbf x\geq 0 \mbox{ for all } \mathbf x\in \mathbb{R}^n.

\mathbf A is idempotent if \mathbf A^2=\mathbf A.

2.1.4 Vector Differentiation

Consider a real-valued function f: \mathbb{R}^p \rightarrow \mathbb{R} of a vector variable \mathbf x=(x_1, \ldots , x_p)^\top. Sometimes we will want to differentiate f. We define the partial derivative of f(\mathbf x) with respect to \mathbf x to be the vector of partial derivatives, i.e. \begin{equation} \frac{\partial f}{\partial \mathbf x}(\mathbf x)=\left [ \begin{array}{c} \frac{\partial f}{\partial x_1}(\mathbf x)\\ ..\\ ..\\ ..\\ \frac{\partial f}{\partial x_p}(\mathbf x) \end{array} \right ] \tag{2.1} \end{equation} The following examples can be worked out directly from the definition (2.1), using the chain rule in some cases.

Example 2.1 If f(\mathbf x)=\mathbf a^\top \mathbf x where \mathbf a\in \mathbb{R}^p is a constant vector, then \frac{\partial f}{\partial \mathbf x}(\mathbf x)=\mathbf a.

Example 2.2 If f(\mathbf x)=(\mathbf x-\mathbf a)^\top \mathbf A(\mathbf x-\mathbf a) for a fixed vector \mathbf a\in \mathbb{R}^p and \mathbf A is a symmetric constant p \times p matrix, then \frac{\partial f}{\partial \mathbf x}(\mathbf x)=2\mathbf A(\mathbf x-\mathbf a).

Example 2.3 Suppose that g: \, \mathbb{R} \rightarrow \mathbb{R} is a differentiable function with derivative g^\prime. Then, using the chain rule for partial derivatives, \frac{\partial g(\mathbf a^\top \mathbf x)}{\partial \mathbf x}=g^{\prime}(\mathbf a^\top\mathbf x)\frac{\partial}{\partial \mathbf x}\left \{\mathbf a^\top \mathbf x\right \}=g^{\prime}(\mathbf a^\top\mathbf x) \mathbf a.

Example 2.4 If f is defined as in Example 2.2 and g is as in Example 2.3 then, using the chain rule again, \frac{\partial }{\partial \mathbf x} g\{f(\mathbf x)\}=g^{\prime} \{f(\mathbf x)\}\frac{\partial f}{\partial \mathbf x}(\mathbf x) =2 g^{\prime}\{(\mathbf x- \mathbf a)^\top \mathbf A(\mathbf x- \mathbf a)\}\mathbf A(\mathbf x-\mathbf a).

If we wish to find a maximum or minimum of f(\mathbf x) we should search for stationary points of f, i.e. solutions to the system of equations \frac{\partial f}{\partial \mathbf x}(\mathbf x)\equiv \left [ \begin{array}{c} \frac{\partial f}{\partial x_1}(\mathbf x)\\ ..\\ ..\\ ..\\ \frac{\partial f}{\partial x_p}(\mathbf x) \end{array} \right ]={\mathbf 0}_p. ::: {.definition #hessian} The Hessian matrix of f is the p \times p matrix of second derivatives. \frac{\partial^2f}{\partial \mathbf x\partial \mathbf x^\top}(\mathbf x) =\left \{ \frac{\partial^2 f(\mathbf x)}{\partial x_j \partial x_k}\right \}_{j,k=1}^p. :::

The nature of a stationary point is determined by the Hessian

If the Hessian is positive (negative) definite at a stationary point \mathbf x, then the stationary point is a minimum (maximum).

If the Hessian has both positive and negative eigenvalues at \mathbf x then the stationary point will be a saddle point.