## 6.5 Computer Tasks

##### Task 1

The `eurodist`

dataset in R gives the road distances between 21 European cities. Note that this is stored as a `dist`

type of object, as outputted by the `dist`

command, i.e. as a lower tri-diagonal matrix. `cmdscale`

will take this directly as input.

Perform multidimensional scaling on this data, and find a two-dimensional set of points which has interpoint distances approximately equal to the data.

Plot these coordinates and label them with the city names. Does your plot look like the map of Europe?

Is the distance matrix

`eurodist`

a Euclidean data matrix and how do you know? If it is not Euclidean, why do you think that might be?Create the Euclidean distance matrix from your set of 2-dimensional points. What is the Frobenius norm between this matrix and the original distance matrix? Use

`cmdscale`

to create a set of points in 3 dimensional space and recompute the distance matrix.

##### Task 2

Consider the synthetic data of 9 binary attributes on 11 cases.

```
df=structure(list(a = c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0), b = c(0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 1), c = c(0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0), d = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0), e = c(0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 1), f = c(0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 1), g = c(0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0), h = c(1, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0), i = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
0)), class = "data.frame", row.names = c(NA, -11L), .Names = c("a",
"b", "c", "d", "e", "f", "g", "h", "i"))
df
```

```
## a b c d e f g h i
## 1 0 0 0 1 0 0 0 1 0
## 2 0 0 0 0 0 0 1 0 0
## 3 0 0 0 0 1 1 1 0 0
## 4 0 0 0 0 0 0 1 0 0
## 5 1 0 1 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 1
## 7 0 1 0 1 0 0 0 0 0
## 8 1 0 1 0 0 0 0 1 0
## 9 0 1 0 0 0 0 0 1 1
## 10 0 0 1 1 0 0 0 0 1
## 11 0 1 0 0 1 1 0 0 0
```

Compute the Jaccard index and SMC similarity matrices for these data.

Perform classical MDS for both similarity matrices, producing a plot of the coordinates (in 2d). Are the results similar?

##### Task 3

In this question we will look at data from 1888 on the fertility and socio-economic status of 47 French speaking provinces in Switzerland.

```
## Fertility Agriculture Examination Education Catholic
## Courtelary 80.2 17.0 15 12 9.96
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Neuveville 76.9 43.5 17 15 5.16
## Porrentruy 76.1 35.3 9 7 90.57
## Infant.Mortality
## Courtelary 22.2
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Neuveville 20.6
## Porrentruy 26.6
```

We will use MDS to find which provinces are similar to each other.

Compute the Euclidean distance matrix for these data.

Use MDS to create a 2-dimensional representation of the data and plot these points, labelling them with the province name.

Use MDS to create a 3-dimensional representation of the data. You can plot this using the

`plot3d`

command from the`rgl`

package. See, for example, here.

- MDS can be also used to reveal a hidden pattern in a correlation matrix. Find the correlation matrix, \(\mathbf R\), of the swiss data. Perform MDS using \(1-\mathbf R\) as the distance matrix and plot the results. Positively correlated covariates are close together on the same side of the plot.

##### Task 4 (if you have time…)

Try MDS on the MNIST data but looking for a 3d representation. Colour the points by their digit label, and create some interactive 3d plots. Does this find useful structure in the data? And is it more informative than the 2d plots we created in the notes.

Read the description of more advance methods here. Pick one and find an R package that implements it and try it on the MNIST data.

**Warning:** The MNIST dataset is large, and so computations can take a long time if you use the full dataset. Thus I usually work with a selection of just 1000 images, which is enough to find interesting patterns in most cases.