1.3 Properties
The CC variables have a mean of zero. Their correlation structure is given in the following proposition:
1.3.1 Connection with linear regression when \(q=1\)
Although CCA is clearly a different technique to linear regression, it turns out that when either \(\dim \mathbf x=p=1\) or \(\dim \mathbf y=q=1\), there is a close connection between the two approaches.
Consider the case \(q=1\) and \(p>1\). Hence there is only a single \(y\)-variable but we still have \(p>1\) \(x\)-variables.
Let’s make the following assumptions:
- The \(\mathbf x_i\) have been centred so that \(\bar{\mathbf x}={\mathbf 0}_p\), the zero vector.
- The covariance matrix for the \(x\)-variables, \(\mathbf S_{\mathbf x\mathbf x}\), has full rank \(p\).
The first assumption means that \[\mathbf S_{xx}=\frac{1}{n}\mathbf X^\top \mathbf X\quad \mbox{and}\quad \mathbf S_{xy}=\frac{1}{n}\mathbf X^\top \mathbf y,\] and the second means that \((\mathbf X^\top \mathbf X)^{-1}\) exists.
Since \(q=1\), the matrix we decompose in CCA \[ \mathbf Q=\mathbf S_{\mathbf x\mathbf x}^{-1/2} \mathbf S_{\mathbf xy}\mathbf S_{yy}^{-1/2} \] is a \(p \times 1\) vector. Consequently, its SVD is just \[ \mathbf Q=\sigma_1 \mathbf u_1, \] where \[ \sigma_1=\vert \vert \mathbf Q\vert \vert_F = (\mathbf Q^\top \mathbf Q)^{\frac{1}{2}} \qquad \text{and} \qquad \mathbf u_1=\mathbf Q/\vert \vert \mathbf Q\vert \vert_F. \] Note that \(\mathbf v=1\) here. Consequently, the first canoncial correlation vector for \(\mathbf x\) is \[\begin{align*} \mathbf a&=\mathbf S_{\mathbf x\mathbf x}^{-1/2}\mathbf u_1 =\mathbf S_{\mathbf x\mathbf x}^{-1/2} \frac{\mathbf Q}{||\mathbf Q||_F}\\ &=\mathbf S_{\mathbf x\mathbf x}^{-1/2} \frac{1}{\vert \vert \mathbf S_{\mathbf x\mathbf x}^{-1/2}\mathbf S_{\mathbf xy}S_{yy}^{-1/2}\vert \vert_F}\mathbf S_{\mathbf x\mathbf x}^{-1/2}\mathbf S_{\mathbf x\mathbf y}S_{yy}^{-1/2}\\ &=\frac{1}{\vert \vert \mathbf S_{\mathbf x\mathbf x}^{-1/2}\mathbf S_{\mathbf xy}\vert \vert_F}\mathbf S_{\mathbf x\mathbf x}^{-1}\mathbf S_{\mathbf xy}\\ &= c (\mathbf X^\top \mathbf X)^{-1}\mathbf X^\top \mathbf y \end{align*}\] where \(c\) is a scalar constant.
Thus, we can see that the first canonical correlation vector \(\mathbf a\) is a scalar multiple of \[ \hat{\pmb \beta}=\left ( \mathbf X^\top \mathbf X\right )^{-1} \mathbf X^\top \mathbf y, \] the classical least squares estimator. Therefore the least squares estimator \(\hat{\pmb \beta}\) solves (1.2). However, it does not usually solve the constrained optimisation problem (1.4) because typically \(\hat{\pmb \beta}^\top \mathbf S_{\mathbf x\mathbf x}\hat{\pmb \beta} \not= 1\), so that the constraint in Equation (1.4) will not be satisfied.
1.3.2 Invariance/equivariance properties of CCA
Suppose we apply orthogonal transformations and translations to the \(\mathbf x_i\) and the \(\mathbf y_i\) of the form \[\begin{equation} {\mathbf h}_i={\mathbf T}\mathbf x_i + {\pmb \mu} \qquad \text{and} \qquad {\mathbf k}_i={\mathbf R}\mathbf y_i +{\pmb \nu}, \qquad i=1,\ldots , n, \tag{1.12} \end{equation}\] where \(\mathbf T\) (\(p \times p\)) and \(\mathbf R\) (\(q \times q\)) are orthogonal matrices, and \(\pmb \mu\) (\(p \times 1\)) and \(\pmb \nu\) (\(q \times 1\)) are fixed vectors.
How do these transformations affect the CC analysis?
Firstly, since CCA depends only on sample covariance matrices, it follows that the translation vectors \(\pmb \mu\) and \(\pmb \nu\) have no effect on the analysis.
Secondly, let’s consider the effect of the rotations by an orthogonal matrix. We’ve seen that CCA in the original \(\mathbf x\) and \(\mathbf y\) coordinates depends on \[\begin{equation} \mathbf Q\equiv \mathbf Q_{\mathbf x\mathbf y}=\mathbf S_{\mathbf x\mathbf x}^{-1/2}\mathbf S_{\mathbf x\mathbf y}\mathbf S_{\mathbf y\mathbf y}^{-1/2}. \tag{1.13} \end{equation}\] In the new coordinates we have \[ \tilde{\mathbf S}_{\mathbf h\mathbf h}={\mathbf T} \mathbf S_{\mathbf x\mathbf x}{\mathbf T}^\top, \qquad \tilde{\mathbf S}_{\mathbf k\mathbf k}={\mathbf R}\mathbf S_{\mathbf y\mathbf y}{\mathbf R}^\top, \] \[ \tilde{\mathbf S}_{\mathbf h\mathbf k}={\mathbf T}\mathbf S_{\mathbf x\mathbf y}{\mathbf R}^\top = \tilde{\mathbf S}_{\mathbf k\mathbf h}^\top, \] where here and below, a tilde above a symbol is used to indicate that the corresponding term is defined in terms of the new \(\mathbf h\), \(\mathbf k\) coordinates, rather than the old \(\mathbf x\), \(\mathbf y\) coordinates. Due to the fact that \(\mathbf T\) and \(\mathbf R\) are orthogonal, \[ \tilde{\mathbf S}_{\mathbf b\mathbf h}^{ 1/2}={\mathbf T}\mathbf S_{\mathbf x\mathbf x}^{ 1/2}{\mathbf T}^\top, \qquad \tilde{\mathbf S}_{\mathbf h\mathbf h}^{ -1/2}={\mathbf T}\mathbf S_{\mathbf x\mathbf x}^{ -1/2}{\mathbf T}^\top \] \[ \tilde{\mathbf S}_{\mathbf k\mathbf k}^{ 1/2}={\mathbf R}\mathbf S_{\mathbf y\mathbf y}^{ 1/2}{\mathbf R}^\top \qquad \text{and} \qquad \tilde{\mathbf S}_{\mathbf k\mathbf k}^{ -1/2}={\mathbf R}\mathbf S_{\mathbf y\mathbf y}^{- 1/2}{\mathbf R}^\top. \] The analogue of (1.13) in the new coordinates is given by \[\begin{align*} \tilde{\mathbf Q}_{\mathbf h k}&=\tilde{\mathbf S}_{\mathbf h\mathbf h}^{-1/2}\tilde{\mathbf S}_{\mathbf h k}\tilde{\mathbf S}_{\mathbf k\mathbf k}^{-1/2}\\ &={\mathbf T} \mathbf S_{\mathbf x\mathbf x}^{-1/2}{\mathbf T}^\top {\mathbf T}\mathbf S_{\mathbf x\mathbf y}{\mathbf R}^\top {\mathbf R}\mathbf S_{\mathbf y\mathbf y}^{-1/2}{\mathbf R}^\top\\ &={\mathbf T}\mathbf S_{\mathbf x\mathbf x}^{-1/2}\mathbf S_{\mathbf x\mathbf y}\mathbf S_{\mathbf y\mathbf y}^{-1/2}{\mathbf R}^\top\\ &={\mathbf T} \mathbf Q_{\mathbf x\mathbf y}{\mathbf R}^\top. \end{align*}\] So, again using the fact that \(\mathbf T\) and \(\mathbf R\) are orthogonal matrices, if \(\mathbf Q_{\mathbf x\mathbf y}\) has SVD \(\sum_{j=1}^t \sigma_j {\mathbf u}_j {\mathbf v}_j^\top\), then \(\tilde{\mathbf Q}_{\mathbf h\mathbf k}\) has SVD \[\begin{align*} \tilde{\mathbf Q}_{\mathbf h\mathbf k}&={\mathbf T }\mathbf Q_{\mathbf x\mathbf y}{\mathbf R}^\top ={\mathbf T} \left ( \sum_{j=1}^t \sigma_j {\mathbf u}_j {\mathbf v}_j^\top \right){\mathbf R}^\top\\ &=\sum_{j=1}^t \sigma_j {\mathbf T}{\mathbf u}_j {\mathbf v}_j^\top {\mathbf R}^\top =\sum_{j=1}^t \sigma_j \left ( {\mathbf T} {\mathbf u}_j \right )\left ({\mathbf R}{\mathbf v}_j \right )^\top =\sum_{j=1}^t \sigma_j \tilde{\mathbf u}_j \tilde{\mathbf v}_j^\top, \end{align*}\] where, for \(j=1, \ldots,t\), the \(\tilde{\mathbf u}_j={\mathbf T}\mathbf u_j\) are mutually orthogonal unit vectors, and the \(\tilde{\mathbf v}_j={\mathbf R}{\mathbf v}_j\) are also mutually orthogonal unit vectors.
Consequently, \(\tilde{\mathbf Q}_{\mathbf h k}\) has the same singular values as \(\mathbf Q_{\mathbf x\mathbf y}\), namely \(\sigma_1, \ldots , \sigma_t\) in both cases, and so the canonical correlation coefficients are invariant with respect to the transformations (1.12). Moreover, since the optimal linear combinations for the \(j\)th CC in the original coordinates are given by \(\mathbf a_j =\mathbf S_{\mathbf x\mathbf x}^{-1/2}{\mathbf u}_j\) and \(\mathbf b_j=\mathbf S_{\mathbf y\mathbf y}^{-1/2}{\mathbf v}_j\), the optimal linear combinations in the new coordinates are given by \[\begin{align*} \tilde{\mathbf a}_{j}&=\mathbf S_{\mathbf h\mathbf h}^{-1/2}{\mathbf T}{\mathbf u}_j\\ &={\mathbf T}\mathbf S_{\mathbf x\mathbf x}^{-1/2}{\mathbf T}^\top {\mathbf T}{\mathbf u}_j\\ &={\mathbf T}\mathbf S_{\mathbf x\mathbf x}^{-1/2}{\mathbf u}_j \\ &={\mathbf T}\mathbf a_{j}, \end{align*}\] and a similar argument shows that \(\tilde{\mathbf b}_{j}={\mathbf R}\mathbf b_{j}\). So under transformations (1.12), the optimal vectors \(\mathbf a_{j}\) and \(\mathbf b_{j}\) transform in an equivariant manner to \(\tilde{\mathbf a}_{j}\) and \(\tilde{\mathbf b}_{j}\), respectively, \(j=1, \ldots , t\).
If either of \(\mathbf T\) or \(\mathbf R\) in (1.12) is not an orthogonal matrix then the singular values are not invariant and the CC vectors do not transform in an equivariant manner.