Chapter 5 Canonical Correlation Analysis (CCA)
The videos for this chapter are available at the following links
- 5.1: Introduction to CCA
- 5.1.1: The first pair of CC variables
- 5.1.2: Example: Premier league data
- 5.2: The full set of CC variables
- 5.3: Properties of CCA
Suppose we observe a random sample of \(n\) bivariate observations \[ \mathbf z_1=(x_1,y_1)^\top , \ldots , \mathbf z_n=(x_n,y_n)^\top. \] If we are interested in exploring possible dependence between the \(x_i\)’s and \(y_i\)’s then among the first things we would do would be to obtain a scatterplot of the \(x_i\)’s against the \(y_i\)’s and calculate the correlation coefficient. Recall that the sample correlation coefficient is defined by \[\begin{align} r={\mathbb{C}\operatorname{or}}(x,y)&=\frac{S_{xy}}{\sqrt{S_{xx}}\sqrt{S_{yy}}}\\ &=\frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\left(\sum_{i=1}^n (x_i-\bar{x})^2 \right)^{1/2} \left(\sum_{i=1}^n (y_i-\bar{y})^2 \right)^{1/2}} \tag{5.1} \end{align}\] where \(\bar{x}=n^{-1}\sum_{i=1}^n x_i\) and \(\bar{y}=n^{-1}\sum_{i=1}^n y_i\) are the sample means.
Recall that the sample correlation is a scale-free measure of the strength of the linear dependence between the \(x_i\)’s and the \(y_i\)’s.
In this chapter we investigate the multivariate analogue of this question. Instead of our bivariate observations being a pair of scalars, suppose instead that we are given two different random vectors \(\mathbf x\) and \(\mathbf y\). In otherwords, for each subject/case \(i\) we have observations \(\{\mathbf x_i,\mathbf y_i\}_{i=1}^n.\)
Multivariate data structures can be understood better if we look at low-dimensional projections of the data. The question is, given a sample \(\{\mathbf x_i, \; \mathbf y_i\}_{i=1}^{n}\), what is a sensible way to assess and describe the strength of the linear dependence between the two vectors?
Canonical correlation analysis (CCA) gives an answer to this question in terms of the best low-dimensional linear projections of the \(\mathbf x\) and \(\mathbf y\) random variables. In a comparable way to PCA, ‘best’ in CCA is defined in terms of maximizing correlations. A key role is played by the singular value decomposition (SVD) introduced in Chapter 3.