8.5 Exercises

  1. Prove that the Bayes classifier can be written as given in Proposition 8.3.

  2. Prove that \(\mathbf W^{-1}\mathbf B\) and \(\mathbf W^{-\frac{1}{2}}\mathbf B\mathbf W^{-\frac{1}{2}}\) have the same eigenvalues and find an expression relating their eigenvectors.

  3. Prove Proposition 8.1. Note that region \(\mathcal{R}\) is convex and connected if for \(\mathbf x_1, \mathbf x_2 \in \mathcal{R}\) we have \[\lambda \mathbf x_1+(1-\lambda) \mathbf x_2 \in \mathcal{R} \mbox{ for all } \lambda \in [0,1].\]

  1. Consider \(g=3\) bivariate normal populations with the same covariance matrix, given by \[ {\boldsymbol{\mu}}_1 = \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \quad {\boldsymbol{\mu}}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \quad {\boldsymbol{\mu}}_3 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad \boldsymbol{\Sigma}= \begin{pmatrix} 5 & 2 \\ 2 & 1 \end{pmatrix}.\] Determine the three maximum likelihood discriminant rules and, hence, sketch the ML discriminant regions. Determine the point at which the three boundary lines meet.
  1. Thepalmerpenguins R package contains size measurements of adult foraging penguins near Palmer Station, Antarctica. We will begin by looking at a subset of this data, looking at just the Adelie (\(n_1=151\)) and Chinstrap (\(n_2=68\)) penguins, and just the bill length and bill depth measurements. The sample mean and covariance for each group are

\[\begin{align*} {\boldsymbol{\mu}}_{A}&=\begin{pmatrix}38.8 \\18.3 \\\end{pmatrix} \qquad \boldsymbol{\Sigma}_A = \begin{pmatrix}7.09&1.27 \\1.27&1.48 \\\end{pmatrix}\\ {\boldsymbol{\mu}}_{C}&=\begin{pmatrix}48.8 \\18.4 \\\end{pmatrix}\qquad \boldsymbol{\Sigma}_C = \begin{pmatrix}11.2&2.48 \\2.48&1.29 \\\end{pmatrix} \end{align*}\]

  1. Find the ML discriminant rule for allocating a new penguin \(\mathbf z=(z_1,z_2)^T\) as either Adelie or Chinstrap assuming that the population covariance matrices for the two groups are equal. In particular, determine the sample ML rule for allocating the new observation \(\mathbf z\). Find the equation of the straight line which separates the two allocation regions and plot the two regions graphically.

  2. Find the Bayes discriminant rule estimating the prior probabilities using the observed frequencies \(n_1\) and \(n_2\). How does this differ to the ML rule?

  3. In addition to the Adelie and Chinstrap penguins, there are also \(n_3=123\) measurements on Gentoo penguins with sample mean and covariance \[ {\boldsymbol{\mu}}_{G}=\begin{pmatrix}47.5 \\15 \\\end{pmatrix} \qquad \boldsymbol{\Sigma}_G = \begin{pmatrix}9.5&1.95 \\1.95&0.963 \\\end{pmatrix}\] The within-group covariance matrix is \[ \frac{1}{342}(151 \mathbf S_A + 68 \mathbf S_C + 123 \mathbf S_G) = \begin{pmatrix}9.25&1.9 \\1.9&1.24 \\\end{pmatrix}.\] The eigendecomposition of \(\mathbf W^{-1}\mathbf B\) is found in R to be

## eigen() decomposition
## $values
## [1] 9.847607 1.269243
## $vectors
##            [,1]      [,2]
## [1,]  0.3470103 0.3350582
## [2,] -0.9378613 0.9421974

Calculate Fisher’s linear discriminant function using just a single LDA variable.

  1. What would Fisher’s discriminant rule be if you use both LDA variables?
  1. Try Q2 from the 2020-21 paper