9.6 Exercises

  1. Consider the data points

\[\mathbf x_1 =\begin{pmatrix}10\\5\end{pmatrix}, \quad \mathbf x_2 =\begin{pmatrix}7\\3\end{pmatrix}, \quad \mathbf x_3 =\begin{pmatrix}4\\8\end{pmatrix}, \quad \mathbf x_4 =\begin{pmatrix}5\\12\end{pmatrix}\]

Starting with the encoder \[\boldsymbol \delta= \begin{pmatrix} 1\\2\\1\\2\end{pmatrix}\] implement (by hand) the K-means cluster greedy optimization algorithm until convergence.

  1. In the table below, information is given on the presence or absence (denoted by \(1\) and \(0\), respectively) of \(6\) unspecified attributes, denoted A1, …, A6, in Lions, Giraffes, Sheep and Humans. \[ \begin{array}{cccccccc} \textrm{Individual} &&\textrm{A1}&\textrm{A2}&\textrm{A3}&\textrm{A4}&\textrm{A5}&\textrm{A6}\\ &&&&&&\\ \textrm{Lion}&&1&1&0&0&1&1\\ \textrm{Giraffe}&&1&1&1&0&0&1\\ \textrm{Sheep}&&1&0&0&1&0&1\\ \textrm{Human}&&0&0&0&0&1&0 \end{array} \]
  1. Use the information in the table to calculate a similarity matrix based on the simple matching coefficient given by Equation (6.8).

  2. Using a suitable transformation, convert the similarity matrix into a dissimilarity matrix, \(\mathbf D\) say.

  3. Apply the single linkage method to the matrix \(\mathbf D\). Summarise your results graphically in a dendrogram.

  4. Apply the complete linkage method to the matrix \(\mathbf D\). Summarise your results graphically in a dendrogram.

  5. Suppose exactly two clusters are required. What would be your clusters based on single linkage and complete linkage?

  6. Repeat part v., but this time with three clusters rather than two.

  7. Briefly (i.e. in one or two sentences) summarise your findings.

  1. Attempt question 4b from the 2017-18 exam paper.

  1. Prove Equation (9.2).