10.4 Multi-output Linear Model
In the standard linear model the responses \(y_i\) are univariate. In the multivariate linear model, the data are \(\{\mathbf x_1, \mathbf y_1\}, \ldots, \{\mathbf x_n, \mathbf y_n\}\) with the responses \(\mathbf y_i \in \mathbb{R}^q\). Here, the linear model takes the form \[\begin{equation} \mathbf y_i= \mathbf B^\top \mathbf x_i +{\pmb \epsilon}_i, \qquad i=1, \ldots , n, \tag{10.5} \end{equation}\] where \(\mathbf B\) is now a \(p \times q\) parameter matrix, and we have an error vector \({\pmb \epsilon}_i\in \mathbb{R}^q\) are \(q \times 1\). The model (10.5) may be written in matrix form as \[\begin{equation} \mathbf Y= \mathbf X\mathbf B+\mathbf E, \tag{10.6} \end{equation}\] where \[\mathbf Y= \begin{pmatrix} - & \mathbf y_1^\top &-\\ &\vdots&\\ -&\mathbf y_n^\top&-\end{pmatrix}\] is the \(n \times q\) data matrix for the \(y\)-variables, \(\mathbf X\) is the \(n \times p\) data matrix as defined before, and \[\stackrel{n \times p}{\mathbf E}=\begin{pmatrix} - & \boldsymbol \epsilon_1^\top &-\\ &\vdots&\\ -&\boldsymbol \epsilon_n^\top&-\end{pmatrix}.\]
How should we estimate \(\mathbf B\)? One approach is to choose \(\mathbf B\) to minimize the sum of squared errors: \[\begin{align}\sum_{i=1}^n\sum_{j=1}^q \epsilon_{ij}^2 &= \operatorname{tr}(\mathbf E^\top \mathbf E) \\ &= \operatorname{tr}((\mathbf Y-\mathbf X\mathbf B)^\top(\mathbf Y-\mathbf X\mathbf B))\\ &=||\mathbf Y-\mathbf X\mathbf B||_F^2. \end{align}\]
Proposition 10.1 The least squares estimator of \(\mathbf B\) is \[\begin{equation} \hat{\mathbf B}= (\mathbf X^\top \mathbf X)^{-1}\mathbf X^\top \mathbf Y \tag{10.7} \end{equation}\]
You will see how to fit multivariate linear models in R in the computer tasks.
Proof. NON-EXAMINABLE The usual proof of this involves differentiating with respect to a matrix, and is horribly messy. However, there is a geometric proof as in the univariate case.
The space of matrices \(M_{n,q} = \{\mathbf Y\in \mathbb{R}^{n \times q} \}\) is a vector space, and we can equip it with an inner product \[\langle \mathbf X, \mathbf Y\rangle_F = \operatorname{tr}(\mathbf X^\top \mathbf Y)\] This is the Frobenius inner product and it induces the Frobenius norm on \(M_{n,q}\) \[||\mathbf X||_F = \langle \mathbf X, \mathbf X\rangle_F^{\frac{1}{2}}\]
Consider the matrix analogue of the column space of \(\mathbf X\) \[\mathcal{C}_q(\mathbf X) =\{\mathbf X\mathbf B: \mathbf B\in \mathbb{R}^{p\times q}\}.\] This is a vector subspace of \(M_{n,q}\). As before, we can see that
\[\min_{\mathbf B} ||\mathbf Y-\mathbf X\mathbf B||_F^2=\min_{\mathbf Y' \in \mathcal{C}_q(\mathbf X)} ||\mathbf Y- \mathbf Y'||_F^2.\]
In other words, we want to find the orthogonal projection of \(\mathbf Y\) onto \(\mathcal{C}_q(\mathbf X)\). Let \(\hat{\mathbf Y} = \arg \min_{\mathbf Y' \in \mathcal{C}_q(\mathbf X)} ||\mathbf Y- \mathbf Y'||_F^2\) be the orthogonal projection, and let \(\hat{\mathbf B}\in \mathbb{R}^{p \times q}\) be the matrix such that \(\hat{\mathbf Y}=\mathbf X\hat{\mathbf B}\). Then \[\mathbf Y-\hat{\mathbf Y} \in \mathcal{C}_q(\mathbf X)^\perp\] and so \[\langle \mathbf Z, \mathbf Y-\hat{\mathbf Y}\rangle \quad \mbox{ for all}\quad \mathbf Z\in \mathcal{C}_q(\mathbf X).\] Writing \(\mathbf Z= \mathbf X\mathbf A\) for some \(\mathbf A\in M_{p,q}\) we can see we must have
\[\operatorname{tr}(\mathbf A^\top \mathbf X^\top(\mathbf Y-\mathbf X\hat{\mathbf B}))=0 \quad \mbox{ for all}\quad \mathbf A\in M_{p,q}.\]
Setting \(\mathbf A=\mathbf e_{ij}\), the standard basis for \(M_{p,q}\) (ie the \(p\times q\) matrix that is all zeros except for the \(ij\) entry which is \(1\)), for \(i=1,\ldots, p\) and \(j=1, \ldots,q\) we get that \[\mathbf X^\top(\mathbf Y-\mathbf X\hat{\mathbf B})={\boldsymbol 0}_{p,q}.\] Rearranging this gives the result.