7.2 The Wishart distribution

In univariate statistics the \chi^2 distribution plays an important role in inference related to the univariate normal, e.g. in the definition of Student’s t-distribution.
The Wishart distribution is a multivariate generalisation of the univariate \chi^2 distribution, and it plays an analogous role in multivariate statistics.

In this section we introduce the Wishart distribution and show that for MVN random variables, the sample covariance matrix \mathbf S has a Wishart distribution.

Definition 7.3 Let \mathbf x_1, \ldots, \mathbf x_n be an IID random sample from N_p ({\boldsymbol 0}, \boldsymbol{\Sigma}). Then \mathbf M= \sum_{i=1}^n \mathbf x_i \mathbf x_i^\top \in \mathbb{R}^{p\times p} is said to have a Wishart distribution with n degrees of freedom and scale matrix \boldsymbol{\Sigma}. We write this as \mathbf M\sim W_p(\boldsymbol{\Sigma}, n) and refer to W_p(\mathbf I_p,n) as a standard Wishart distribution.

Note:

  • W_p(\boldsymbol{\Sigma},n) is a probability distribution on the set of p \times p symmetric non-negative definite random matrices.

  • Recall that if z_1, \ldots, z_n \sim N(0, 1), then \sum_{i=1}^n z_i^2 \sim \chi^2_n. Thus we can see that the Wishart distribution arises from the same kind of process: it is the sum of zero mean (multivariate) normal random variables squared.

  • In particular, note that when p=1, W_1(1,n) is the \chi_n^2 distribution and W_1(\sigma^2,n) is the \sigma^2 \chi_n^2 distribution.

  • If \mathbf X is the usual n \times p matrix with rows \mathbf x_i^\top, then \mathbf M= \mathbf X^\top \mathbf X.

We can sample from the Wishart distribution in R using the rWishart command. For example, setting \boldsymbol{\Sigma}=\mathbf I_2 and using 2 degrees of freedom, we can generate 4 random samples \mathbf M_1, \ldots, \mathbf M_4 \sim W_2(\mathbf I_2, 2) as follows:

out <- rWishart(n=4, df=2, Sigma=diag(1,2))

Visualizing these by plotting the ellipses with \mathbf x^\top \mathbf M_i \mathbf x=c for some constant c, we can see the variability in these random matrices:

Proposition 7.6 Let \mathbf M\sim W_p(\boldsymbol{\Sigma}, n). Then {\mathbb{E}}\mathbf M= n \boldsymbol{\Sigma} and if the ij^{th} element of \boldsymbol{\Sigma} is \sigma_{ij}, and the ij^{th} element of \mathbf M is m_{ij}, then {\mathbb{V}\operatorname{ar}}(m_{ij}) = n \left(\sigma_{ij}^2+\sigma_{ii}\sigma_{jj} \right)

7.2.1 Properties

We now use the definition of W_p(\boldsymbol{\Sigma}, n) to prove some important results.

Proposition 7.7 If \mathbf M\sim W_p(\boldsymbol{\Sigma},n) and \mathbf A is a fixed q \times p matrix, then \mathbf A\mathbf M\mathbf A^\top \sim W_q \left(\mathbf A\boldsymbol{\Sigma}\mathbf A^\top, n \right).

Proof. From the definition, let \mathbf M= \sum_{i=1}^n \mathbf x_i \mathbf x_i^\top, where \mathbf x_i \sim N_p({\boldsymbol 0},\boldsymbol{\Sigma}). Then \begin{align*} \mathbf A\mathbf M\mathbf A^\top &= \mathbf A\left(\sum_{i=1}^n \mathbf x_i \mathbf x_i^\top \right)\mathbf A^\top\\ &= \sum_{i=1}^n (\mathbf A\mathbf x_i)(\mathbf A\mathbf x_i)^\top = \sum_{i=1}^n \mathbf y_i \mathbf y_i^\top \end{align*} where \mathbf y_i = \mathbf A\mathbf x_i \sim N_q({\boldsymbol 0},\mathbf A\boldsymbol{\Sigma}\mathbf A^\top), by Proposition 7.1. Now we apply the definition of the Wishart distribution to \mathbf y_1,\ldots,\mathbf y_n and, hence, \sum_{i=1}^n \mathbf y_i \mathbf y_i^\top \sim W_q\left(\mathbf A\boldsymbol{\Sigma}\mathbf A^\top, n \right).

Proposition 7.8 If \mathbf M\sim W_p(\boldsymbol{\Sigma},n) and \mathbf a is a fixed p \times 1 vector then \mathbf a^\top \mathbf M\mathbf a\sim \left(\mathbf a^\top \boldsymbol{\Sigma}\mathbf a\right)\chi_n^2.

Note that an alternative way to write this is as \frac{ \mathbf a^\top \mathbf M\mathbf a}{ \mathbf a^\top \boldsymbol{\Sigma}\mathbf a} \sim \chi_n^2.

Proof. Applying Proposition 7.7 with \mathbf A= \mathbf a^\top, we see \mathbf a^\top \mathbf M\mathbf a\sim W_1( \mathbf a^\top \boldsymbol{\Sigma}\mathbf a, n).

If we let z_i \sim N(0,1), and \sigma = (\mathbf a^\top \boldsymbol{\Sigma}\mathbf a)^\frac{1}{2}, then \sigma z_i \sim N(0, \mathbf a^\top \boldsymbol{\Sigma}\mathbf a). Thus \begin{align*} \sum_{i=1}^n \sigma^2 z_i^2 &\sim W_1(\mathbf a^\top \boldsymbol{\Sigma}\mathbf a, n) \quad \mbox{by the definition of the Wishart distribution}\\ &= \sigma^2 \sum_{i=1}^n z_i \\ &\sim (\mathbf a^\top \boldsymbol{\Sigma}\mathbf a)\chi^2_n \quad \mbox{by the definition of} \chi^2. \end{align*}

Proposition 7.9 If \mathbf M_1 \sim W_p(\boldsymbol{\Sigma},n_1) and \mathbf M_2 \sim W_p(\boldsymbol{\Sigma},n_2) are independent then \mathbf M_1 + \mathbf M_2 \sim W_p(\boldsymbol{\Sigma},n_1 + n_2).

Proof. From the definition, let \mathbf M_1 = \sum_{i=1}^{n_1} \mathbf x_i \mathbf x_i^\top and let \mathbf M_2 = \sum_{i=n_1+1}^{n_1+n_2} \mathbf x_i \mathbf x_i^\top, where \mathbf x_i \sim N_p({\boldsymbol 0},\boldsymbol{\Sigma}), then \mathbf M_1+\mathbf M_2 = \sum_{i=1}^{n_1+n_2} \mathbf x_i \mathbf x_i^\top \sim W_p(\boldsymbol{\Sigma},n_1 + n_2) by the definition of the Wishart distribution.

7.2.2 Cochran’s theorem

Our next result is known as Cochran’s theorem. We use Cochran’s theorem to show that sample covariance matrices have a scaled Wishart distribution.

First though, recall the definition of projection matrices from Section 2.3.3. Namely, that \mathbf P is a projection matrix if \mathbf P^2=\mathbf P.

Theorem 7.1 (Cochran’s Theorem) Suppose \stackrel{n \times n}{\mathbf P} is a projection matrix of rank r. Assume that \mathbf X is an n \times p data matrix with IID rows that have a common N_p({\mathbf 0}_p, \boldsymbol{\Sigma}) distribution, where \Sigma has full rank p. Note the identity \begin{equation} \mathbf X^\top \mathbf X= \mathbf X^\top {\mathbf P} \mathbf X+ \mathbf X^\top ({\mathbf I}_n -{\mathbf P})\mathbf X. \tag{7.3} \end{equation} Then \begin{equation} \mathbf X^\top {\mathbf P} \mathbf X\sim W_p(\boldsymbol{\Sigma}, r), \qquad \mathbf X^\top ({\mathbf I}_n -{\mathbf P})\mathbf X\sim W_p(\boldsymbol{\Sigma}, n-r), \tag{7.4} \end{equation} and \mathbf X^\top {\mathbf P} \mathbf X and \mathbf X^\top ({\mathbf I}_n -{\mathbf P})\mathbf X are independent.



We’ll prove this result below. Let’s first understand why it is useful.

Proposition 7.10 If \mathbf x_1,\ldots,\mathbf x_n is an IID sample from N_p({\boldsymbol{\mu}},\boldsymbol{\Sigma}), then n \mathbf S= \sum_{i=1}^n (\mathbf x_i - \bar{\mathbf x})(\mathbf x_i - \bar{\mathbf x})^\top \sim W_p(\boldsymbol{\Sigma},n-1).

Proof. Let \mathbf P= {\mathbf H}\equiv \mathbf I_n - n^{-1}{\mathbf 1}_n {\mathbf 1}_n^\top, the n \times n centering matrix, where {\mathbf 1}_n is the n \times 1 vector of ones.

\mathbf H is a projection matrix (property 1. of 2.4), and clearly, \mathbf I_n - \mathbf P=n^{-1} {\mathbf 1}_n {\mathbf 1}_n^\top has rank 1, and thus \mathbf H must have rank n-1. Therefore, using Cochran’s Theorem (7.1), \mathbf X^\top \mathbf H\mathbf X\sim W_p(\boldsymbol{\Sigma}, n-1). But
\mathbf X^\top \mathbf H\mathbf X=n\mathbf S, (Property 6. in Section 2.4) and consequently, n\mathbf S\sim W_p(\boldsymbol{\Sigma}, n-1), as required.



Thus, sample covariance matrices have a scaled Wishart distribution. This result will be key in the next section, as it will allow us to compute the sampling distribution of a test statistic that we will then use in hypothesis test.

We will now prove Cochran’s theorem.

Proof. Non-examinable

We first prove the result for the case \boldsymbol{\Sigma}= {\mathbf I}_p.

Using the Spectral Decomposition Theorem 3.3 and noting that the eigenvalues of projection matrices must be either 0 or 1, we can write {\mathbf P}=\sum_{j=1}^r \mathbf v_j \mathbf v_j^\top \qquad \hbox{and} \qquad (\mathbf I_n-{\mathbf P})=\sum_{j=r+1}^n \mathbf v_j \mathbf v_j^\top where \mathbf v_1, \ldots , \mathbf v_n \in \mathbb{R}^n are mutually orthogonal unit vectors. Then \begin{align} \mathbf X^\top \mathbf P\mathbf X&= \mathbf X^\top \left (\sum_{j=1}^r \mathbf v_j \mathbf v_j^\top \right) \mathbf X\nonumber \\ & =\sum_{j=1}^r \mathbf X^\top \mathbf v_j \mathbf v_j^\top \mathbf X=\sum_{j=1}^r \mathbf y_j \mathbf y_j^\top, \tag{7.5} \end{align} and similarly, \begin{equation} \mathbf X^\top (\mathbf I_n -\mathbf P) \mathbf X=\sum_{j=r+1}^n \mathbf y_j \mathbf y_j^\top, \tag{7.6} \end{equation} where \mathbf y_j=\mathbf X^\top \mathbf v_j is a p \times 1 vector.

Claim The \mathbf y_j are iid multivariate normal random variables: \mathbf y_j \sim N_p({\mathbf 0}_p, \mathbf I_p).

If the claim is true, then it immediately follows from the definition of the Wishart distribution that (7.5) has a Wishart W_p(\mathbf I_p,r) distribution and (7.6) has a Wishart W_p(\mathbf I_p, n-r) distribution. Moreover they are independent becasue the \mathbf y_j are all independent.

Then to prove the general case with covariance matrix \boldsymbol{\Sigma}, note that if \mathbf x_i\sim N_p({\boldsymbol 0}, \boldsymbol{\Sigma}), then we can write \mathbf x_i=\boldsymbol{\Sigma}^{1/2}\mathbf z_i where \mathbf z_i \sim N_p({\boldsymbol 0}, \mathbf I_p).

Thus \begin{align*} \mathbf X^\top \mathbf P\mathbf X&= \boldsymbol{\Sigma}^{1/2} \mathbf Z^\top\mathbf P\mathbf Z\boldsymbol{\Sigma}^{1/2}\\ &\sim \boldsymbol{\Sigma}^{1/2} W_p(\mathbf I_p, r) \boldsymbol{\Sigma}^{1/2} \mbox{ by the result above}\\ &\sim W_p(\boldsymbol{\Sigma}, r) \end{align*} where the final line follows by Proposition 7.7. Here, \mathbf X and \mathbf Z are matrices with rows given by \mathbf x_i and \mathbf z_i respectively.

To complete the proof it only remains to prove the claim that \mathbf y_j \sim N_p({\mathbf 0}_p, \mathbf I_p).

We can immediately see that the \mathbf y_j must be MVN of dimension p, and that they have mean vector {\boldsymbol 0}_p. To see the covariance and independence parts, note that the k^{th} element of \mathbf y_j is y_{jk} = \sum_{i=1}^n x_{ik}v_{ji} and so the k, l^{th} element of the covariance matrix between \mathbf y_j and \mathbf y_{j'} is

\begin{align*} {\mathbb{E}}(y_{jk} y_{j'l}) &= {\mathbb{E}}(\sum_{i=1}^n x_{ik}v_{ji} \sum_{i'=1}^n x_{i'l}v_{j'i'})\\ &=\sum_{i=1}^n\sum_{i'=1}^n v_{ji} {\mathbb{E}}(x_{ik}x_{i'l})v_{j'i'}\\ &=\begin{cases} 0 &\mbox{ if } k\not = l \mbox{ as } x_{ik} \mbox{ independent of } x_{il} \\ \sum_{i=1}^n v_{ji} v_{j'i} &\mbox{ if } k=l\mbox{ as }x_{ik} \mbox{ is independent of } x_{i'k} \mbox{ for }i\not=i'. \end{cases}\\ \end{align*}

Finally \begin{align*} \sum_{i=1}^n v_{ji} v_{j'i}&= \mathbf v_j^\top \mathbf v_{j'}\\ &=\begin{cases} 1 &\mbox{if } j=j'\\ 0 &\mbox{otherwise}. \end{cases} \end{align*}

Thus {\mathbb{C}\operatorname{ov}}(\mathbf y_j, \mathbf y_{j'}) = {\boldsymbol 0}_{p\times p} for j\not = j' and {\mathbb{V}\operatorname{ar}}(\mathbf y_j) = \mathbf I_p. Thus we have proved the claim once we recall that uncorrelated implies independence for multivariate normal random variables.