5.2 The Wishart distribution

In univariate statistics the \(\chi^2\) distribution plays an important role in inference related to the univariate normal, e.g. in the definition of Student’s \(t\)-distribution.
The Wishart distribution is a multivariate generalisation of the univariate \(\chi^2\) distribution, and it plays an analogous role in multivariate statistics.

In this section we introduce the Wishart distribution and show that for MVN random variables, the sample covariance matrix \(\bS\) has a Wishart distribution.

Definition 5.3 Let \(\bx_1, \ldots, \bx_n\) be an IID random sample from \(N_p (\bzero, \bSigma)\). Then \[\bM = \sum_{i=1}^n \bx_i \bx_i^\top \in \mathbb{R}^{p\times p}\] is said to have a Wishart distribution with \(n\) degrees of freedom and scale matrix \(\bSigma\). We write this as \[\bM \sim W_p(\bSigma, n)\] and refer to \(W_p(\bI_p,n)\) as a standard Wishart distribution.

Note:

\(W_p(\bSigma,n)\) is a probability distribution on the set of \(p \times p\) symmetric non-negative definite random matrices.
Recall that if \(z_1, \ldots, z_n \sim N(0, 1)\), then \[\sum_{i=1}^n z_i^2 \sim \chi^2_n.\] Thus we can see that the Wishart distribution arises from the same kind of process: it is the sum of zero mean (multivariate) normal random variables squared.
In particular, note that when \(p=1\), \(W_1(1,n)\) is the \(\chi_n^2\) distribution and \(W_1(\sigma^2,n)\) is the \(\sigma^2 \chi_n^2\) distribution.
If \(\bX\) is the usual \(n \times p\) matrix with rows \(\bx_i^\top\), then \[\bM = \bX^\top \bX.\]

We can sample from the Wishart distribution in R using the rWishart command. For example, setting \(\bSigma =\bI_2\) and using 2 degrees of freedom, we can generate 4 random samples \(\bM_1, \ldots, \bM_4 \sim W_2(\bI_2, 2)\) as follows:

out <- rWishart(n=4, df=2, Sigma=diag(1,2))

Visualizing these by plotting the ellipses with \(\bx^\top \bM_i \bx=c\) for some constant \(c\), we can see the variability in these random matrices:

Proposition 5.6 Let \(\bM \sim W_p(\bSigma, n)\). Then \[\BE\bM = n \bSigma\] and if the \(ij^{th}\) element of \(\bSigma\) is \(\sigma_{ij}\), and the \(ij^{th}\) element of \(\bM\) is \(m_{ij}\), then \[\var(m_{ij}) = n \left(\sigma_{ij}^2+\sigma_{ii}\sigma_{jj} \right)\]

5.2.1 Properties

We now use the definition of \(W_p(\bSigma, n)\) to prove some important results.

Proposition 5.7 If \(\bM \sim W_p(\bSigma,n)\) and \(\bA\) is a fixed \(q \times p\) matrix, then \[ \bA \bM \bA^\top \sim W_q \lb \bA \bSigma \bA^\top, n \rb.\]

Proof. From the definition, let \(\bM = \sum_{i=1}^n \bx_i \bx_i^\top\), where \(\bx_i \sim N_p(\bzero,\bSigma)\). Then \[\begin{align*} \bA \bM \bA^\top &= \bA \lb \sum_{i=1}^n \bx_i \bx_i^\top \rb \bA^\top\\ &= \sum_{i=1}^n (\bA \bx_i)(\bA \bx_i)^\top = \sum_{i=1}^n \by_i \by_i^\top \end{align*}\] where \(\by_i = \bA \bx_i \sim N_q(\bzero,\bA \bSigma \bA^\top)\), by Proposition 5.1. Now we apply the definition of the Wishart distribution to \(\by_1,\ldots,\by_n\) and, hence, \(\sum_{i=1}^n \by_i \by_i^\top \sim W_q\lb \bA \bSigma \bA^\top, n \rb\).

Proposition 5.8 If \(\bM \sim W_p(\bSigma,n)\) and \(\ba\) is a fixed \(p \times 1\) vector then \[ \ba^\top \bM \ba \sim \lb \ba^\top \bSigma \ba \rb \chi_n^2.\]

Note that an alternative way to write this is as \[\frac{ \ba^\top \bM \ba }{ \ba^\top \bSigma \ba } \sim \chi_n^2.\]

Proof. Applying Proposition 5.7 with \(\bA = \ba^\top\), we see \(\ba^\top \bM \ba \sim W_1( \ba^\top \bSigma \ba, n)\).

If we let \(z_i \sim N(0,1)\), and \(\sigma = (\ba^\top \bSigma \ba)^\frac{1}{2}\), then \(\sigma z_i \sim N(0, \ba^\top \bSigma \ba)\). Thus \[\begin{align*} \sum_{i=1}^n \sigma^2 z_i^2 &\sim W_1(\ba^\top \bSigma \ba, n) \quad \mbox{by the definition of the Wishart distribution}\\ &= \sigma^2 \sum_{i=1}^n z_i \\ &\sim (\ba^\top \bSigma \ba)\chi^2_n \quad \mbox{by the definition of} \chi^2. \end{align*}\]

Proposition 5.9 If \(\bM_1 \sim W_p(\bSigma,n_1)\) and \(\bM_2 \sim W_p(\bSigma,n_2)\) are independent then \[\bM_1 + \bM_2 \sim W_p(\bSigma,n_1 + n_2).\]

Proof. From the definition, let \(\bM_1 = \sum_{i=1}^{n_1} \bx_i \bx_i^\top\) and let \(\bM_2 = \sum_{i=n_1+1}^{n_1+n_2} \bx_i \bx_i^\top\), where \(\bx_i \sim N_p(\bzero,\bSigma)\), then \(\bM_1+\bM_2 = \sum_{i=1}^{n_1+n_2} \bx_i \bx_i^\top \sim W_p(\bSigma,n_1 + n_2)\) by the definition of the Wishart distribution.

5.2.2 Cochran’s theorem

Our next result is known as Cochran’s theorem. We use Cochran’s theorem to show that sample covariance matrices have a scaled Wishart distribution.

First though, recall the definition of projection matrices from Section 1.3.3. Namely, that \(\bP\) is a projection matrix if \(\bP^2=\bP\).

Theorem 5.1 (Cochran’s Theorem) Suppose \(\stackrel{n \times n}{\mathbf P}\) is a projection matrix of rank \(r\). Assume that \(\bX\) is an \(n \times p\) data matrix with IID rows that have a common \(N_p({\mathbf 0}_p, \bSigma)\) distribution, where \(\Sigma\) has full rank \(p\). Note the identity \[\begin{equation} \bX^\top \bX = \bX^\top {\mathbf P} \bX + \bX^\top ({\mathbf I}_n -{\mathbf P})\bX. \tag{5.3} \end{equation}\] Then \[\begin{equation} \bX^\top {\mathbf P} \bX \sim W_p(\bSigma, r), \qquad \bX^\top ({\mathbf I}_n -{\mathbf P})\bX \sim W_p(\bSigma, n-r), \tag{5.4} \end{equation}\] and \(\bX^\top {\mathbf P} \bX\) and \(\bX^\top ({\mathbf I}_n -{\mathbf P})\bX\) are independent.

We’ll prove this result below. Let’s first understand why it is useful.

Proposition 5.10 If \(\bx_1,\ldots,\bx_n\) is an IID sample from \(N_p(\bmu,\bSigma)\), then \[ n \bS = \sum_{i=1}^n (\bx_i - \bar{\bx})(\bx_i - \bar{\bx})^\top \sim W_p(\bSigma,n-1).\]

Proof. Let \(\bP= {\mathbf H}\equiv \bI_n - n^{-1}{\mathbf 1}_n {\mathbf 1}_n^\top\), the \(n \times n\) centering matrix, where \({\mathbf 1}_n\) is the \(n \times 1\) vector of ones.

\(\bH\) is a projection matrix (property 1. of 1.4), and clearly, \(\bI_n - \bP=n^{-1} {\mathbf 1}_n {\mathbf 1}_n^\top\) has rank \(1\), and thus \(\bH\) must have rank \(n-1\). Therefore, using Cochran’s Theorem (5.1), \[ \bX^\top \bH \bX \sim W_p(\bSigma, n-1). \] But
\[\bX^\top \bH \bX =n\bS,\] (Property 6. in Section 1.4) and consequently, \(n\bS \sim W_p(\bSigma, n-1)\), as required.

Thus, sample covariance matrices have a scaled Wishart distribution. This result will be key in the next section, as it will allow us to compute the sampling distribution of a test statistic that we will then use in hypothesis test.

We will now prove Cochran’s theorem.

Proof. Non-examinable

We first prove the result for the case \(\bSigma = {\mathbf I}_p\).

Using the Spectral Decomposition Theorem ?? and noting that the eigenvalues of projection matrices must be either \(0\) or \(1\), we can write \[ {\mathbf P}=\sum_{j=1}^r \bv_j \bv_j^\top \qquad \hbox{and} \qquad (\bI_n-{\mathbf P})=\sum_{j=r+1}^n \bv_j \bv_j^\top \] where \(\bv_1, \ldots , \bv_n \in \mathbb{R}^n\) are mutually orthogonal unit vectors. Then \[\begin{align} \bX^\top \bP \bX &= \bX^\top \left (\sum_{j=1}^r \bv_j \bv_j^\top \right) \bX \nonumber \\ & =\sum_{j=1}^r \bX^\top \bv_j \bv_j^\top \bX =\sum_{j=1}^r \by _j \by_j^\top, \tag{5.5} \end{align}\] and similarly, \[\begin{equation} \bX^\top (\bI_n -\bP) \bX =\sum_{j=r+1}^n \by _j \by_j^\top, \tag{5.6} \end{equation}\] where \(\by_j=\bX^\top \bv_j\) is a \(p \times 1\) vector.

Claim The \(\by_j\) are iid multivariate normal random variables: \[\by_j \sim N_p({\mathbf 0}_p, \bI_p).\]

If the claim is true, then it immediately follows from the definition of the Wishart distribution that (5.5) has a Wishart \(W_p(\bI_p,r)\) distribution and (5.6) has a Wishart \(W_p(\bI_p, n-r)\) distribution. Moreover they are independent becasue the \(\by_j\) are all independent.

Then to prove the general case with covariance matrix \(\bSigma\), note that if \(\bx_i\sim N_p(\bzero, \bSigma)\), then we can write \(\bx_i=\bSigma^{1/2}\bz_i\) where \(\bz_i \sim N_p(\bzero, \bI_p)\).

Thus \[\begin{align*} \bX^\top \bP \bX &= \bSigma^{1/2} \bZ^\top\bP\bZ \bSigma^{1/2}\\ &\sim \bSigma^{1/2} W_p(\bI_p, r) \bSigma^{1/2} \mbox{ by the result above}\\ &\sim W_p(\bSigma, r) \end{align*}\] where the final line follows by Proposition 5.7. Here, \(\bX\) and \(\bZ\) are matrices with rows given by \(\bx_i\) and \(\bz_i\) respectively.

To complete the proof it only remains to prove the claim that \(\by_j \sim N_p({\mathbf 0}_p, \bI_p).\)

We can immediately see that the \(\by_j\) must be MVN of dimension \(p\), and that they have mean vector \(\bzero_p\). To see the covariance and independence parts, note that the \(k^{th}\) element of \(\by_j\) is \[y_{jk} = \sum_{i=1}^n x_{ik}v_{ji}\] and so the \(k, l^{th}\) element of the covariance matrix between \(\by_j\) and \(\by_{j'}\) is

\[\begin{align*} \BE(y_{jk} y_{j'l}) &= \BE(\sum_{i=1}^n x_{ik}v_{ji} \sum_{i'=1}^n x_{i'l}v_{j'i'})\\ &=\sum_{i=1}^n\sum_{i'=1}^n v_{ji} \BE(x_{ik}x_{i'l})v_{j'i'}\\ &=\begin{cases} 0 &\mbox{ if } k\not = l \mbox{ as } x_{ik} \mbox{ independent of } x_{il} \\ \sum_{i=1}^n v_{ji} v_{j'i} &\mbox{ if } k=l\mbox{ as }x_{ik} \mbox{ is independent of } x_{i'k} \mbox{ for }i\not=i'. \end{cases}\\ \end{align*}\]

Finally \[\begin{align*} \sum_{i=1}^n v_{ji} v_{j'i}&= \bv_j^\top \bv_{j'}\\ &=\begin{cases} 1 &\mbox{if } j=j'\\ 0 &\mbox{otherwise}. \end{cases} \end{align*}\]

Thus \(\cov(\by_j, \by_{j'}) = \bzero_{p\times p}\) for \(j\not = j'\) and \(\var(\by_j) = \bI_p\). Thus we have proved the claim once we recall that uncorrelated implies independence for multivariate normal random variables.