5.5 Exercises
- A sales company surveyed \(50\) of its employees in order to determine the factors that influence sales performance. Two collections of variables were measured. The first set related to sales performance
- Sales Growth
- Sales Profitability
- New Account Sales
The second set of variables are test scores measuring intelligence:
- Creativity
- Mechanical Reasoning
- Abstract Reasoning
- Mathematics
You can download the data set sales.csv
from Moodle. The following analysis is carried out in R.
dat=read.csv(file='sales2.csv', sep=',',header=TRUE)
X = dat |> dplyr::select('growth', 'profit', 'new')
Y = dat |> dplyr::select(-'growth', -'profit', -'new')
library(CCA)
cc.out <- cc(X,Y)
print(cc.out$cor)
## [1] 0.9944827 0.8781065 0.3836057
## [,1] [,2] [,3]
## growth -0.06237788 -0.1740703 0.3771529
## profit -0.02092564 0.2421641 -0.1035150
## new -0.07825817 -0.2382940 -0.3834151
## [,1] [,2] [,3]
## create -0.06974814 -0.19239132 -0.24655659
## mech -0.03073830 0.20157438 0.14189528
## abs -0.08956418 -0.49576326 0.28022405
## math -0.06282997 0.06831607 -0.01133259
The following gives the correlation between the original variables and the transformed variables
## [,1] [,2] [,3]
## growth -0.9798776 0.0006477883 0.199598477
## profit -0.9464085 0.3228847489 -0.007504408
## new -0.9518620 -0.1863009724 -0.243414776
## [,1] [,2] [,3]
## create -0.6383313 -0.2156981 -0.65140953
## mech -0.7211626 0.2375644 0.06773775
## abs -0.6472493 -0.5013329 0.57422365
## math -0.9440859 0.1975329 0.09422619
## [,1] [,2] [,3]
## growth -0.9744713 0.0005688272 0.076567107
## profit -0.9411869 0.2835272081 -0.002878734
## new -0.9466102 -0.1635921013 -0.093375287
## [,1] [,2] [,3]
## create -0.6348095 -0.1894059 -0.24988439
## mech -0.7171837 0.2086069 0.02598458
## abs -0.6436782 -0.4402237 0.22027544
## math -0.9388771 0.1734549 0.03614570
Describe the first pair of canonical variables, give their correlation, and provide an interpretation.
Describe the second pair of canonical variables, and provide an interpretation.
- Attempt exam question 1 part (b) from the 2017-18 exam paper.
- Suppose that \(\mathbf z= (\mathbf x^\top \mathbf y^\top)^\top\) is a random vector, where both \(\mathbf x\) and \(\mathbf y\) are sub-vectors of dimension \(p\), so that \(\mathbf z\) is \((2p)\times 1\). Define
\[{\mathbb{V}\operatorname{ar}}(\mathbf z)=\boldsymbol{\Sigma}_{\mathbf z\mathbf z}=\begin{pmatrix} \boldsymbol{\Sigma}_{\mathbf x\mathbf x} & \boldsymbol{\Sigma}_{\mathbf x\mathbf y}\\\boldsymbol{\Sigma}_{\mathbf y\mathbf x} & \boldsymbol{\Sigma}_{\mathbf y\mathbf y} \end{pmatrix}.\]
- Suppose that \(\mathbf y= \mathbf T\mathbf x\) where \(\mathbf T\) is a fixed matrix. Find \(\boldsymbol{\Sigma}_{\mathbf x\mathbf y}\) and \(\boldsymbol{\Sigma}_{\mathbf y\mathbf y}\) in terms of \(\boldsymbol{\Sigma}_{\mathbf x\mathbf x}\) and \(\mathbf T\).
- Assuming now that \(\mathbf T\) is an orthogonal matrix and \(\boldsymbol{\Sigma}_{\mathbf x\mathbf x}\) is of full rank, determine the singular values of the matrix \(\mathbf Q=\boldsymbol{\Sigma}_{\mathbf x\mathbf x}^{-1/2}\boldsymbol{\Sigma}_ {\mathbf x\mathbf y}\boldsymbol{\Sigma}_{\mathbf y\mathbf y}^{-1/2}\), and hence write down the canonical correlation coefficients.
- Suppose now that \(\mathbf T\) is non-singular but not orthogonal. Comment on whether the answer to part (b) changes.
- We will now prove Proposition 5.3 by induction.
The case for \(k=1\) was proved in Section 5.1 in Proposition 3.9. Assume the result is true for \(k\). Consider the objective
\[\mathcal{L} = \mathbf a^\top \mathbf Q\mathbf b+ \sum_{i=1}^k \gamma_i\mathbf a^\top \mathbf a_i + \sum_{i=1}^k \mu_i\mathbf b^\top \mathbf b_i + \frac{\lambda_1}{2}(1-\mathbf a^\top\mathbf a)+ \frac{\lambda_2}{2}(1-\mathbf b^\top\mathbf b)\]
where \(\lambda_i, \mu_i, \gamma_i\) are Lagrangian multipliers.
By differentiating with respect to \(\mathbf a\) and \(\mathbf b\) and setting the derivative to zero show that \[\begin{align} \mathbf Q\mathbf b+ \sum\gamma_i \mathbf a_i - \lambda_1 \mathbf a&= 0 \tag{5.14}\\ \mathbf Q^\top\mathbf a+ \sum\mu_i \mathbf b_i - \lambda_2 \mathbf b&= 0. \tag{5.15} \end{align}\]
By left multiplying the equations above by \(\mathbf a^\top\) and \(\mathbf b^\top\) respectively show that \[\lambda_1=\lambda_2 = \mathbf a^\top \mathbf Q\mathbf b.\]
By left multiplying (5.14) by \(\mathbf a_i^\top\) show that \(\gamma_i=0\) for \(i=1, \ldots, k\). Show similarly that \(\mu_i =0\) for \(i=1, \ldots, k\).
Finally, by copying the proof of Proposition 3.9, prove Proposition 5.3.
- Show the mean of the cc variables \(\eta_k\) and \(\psi_k\) is zero. Prove Proposition 5.4 giving the variance of covariance the cc variables.