10.6 Exercises

Read this set of tweets and revisit Exercises 4 and 5 from Chapter 3. Do you see the link between the exercises and the consequences for linear models?
Show that the ridge regression estimator of \(\boldsymbol \beta\) is \[\hat{\boldsymbol \beta}^{ridge} = (\mathbf X^\top \mathbf X+ \lambda\mathbf I_p)^{-1}\mathbf X^\top \mathbf y.\] Prove that the inverse exists if \(\lambda > 0\).

Finally, prove that the estimator can rewritten as \[\hat{\boldsymbol \beta}^{ridge} = \mathbf X^\top(\mathbf X\mathbf X^\top + \lambda\mathbf I_n)^{-1} \mathbf y.\] When might this be useful?

Let \[\hat{\boldsymbol \beta}_\lambda=\arg \min_{\boldsymbol \beta} ||\mathbf y-\mathbf X\boldsymbol \beta||_2^2+\lambda ||\boldsymbol \beta||\] Prove that \[||\hat{\boldsymbol \beta}_\lambda|| \leq ||\hat{\boldsymbol \beta}^{ols}||.\] Note that we can prove this for a general norm \(||\cdot||\), not just the \(L_2\) norm.

For the normal linear model \[\mathbf y= \mathbf X\boldsymbol \beta+ N({\boldsymbol 0}, \sigma^2 \mathbf I)\] show that \[\begin{align} {\mathbb{V}\operatorname{ar}}(\hat{\boldsymbol \beta}^{ols}) &= \sigma^2 (\mathbf X^\top \mathbf X)^{-1}\\ &= \sigma^2\sum_{i=1}^p \frac{\mathbf v_i \mathbf v_i^\top}{\lambda_i} \end{align}\] where \(\lambda_i\) are the eigenvalues of \(\mathbf X^\top \mathbf X\), and \(\mathbf v_i\) the corresponding unit eigenvectors.

If \(\hat{\boldsymbol \beta}_k\) is the PCR estimator based on the first \(k\) principal components, show that \[{\mathbb{V}\operatorname{ar}}(\hat{\boldsymbol \beta}_k) = \sigma^2 \sum_{i=1}^k \frac{\mathbf v_i \mathbf v_i^\top}{\lambda_i}.\]

Thus, show that \[\mathbf A_k = {\mathbb{V}\operatorname{ar}}(\hat{\boldsymbol \beta}^{ols})-{\mathbb{V}\operatorname{ar}}(\hat{\boldsymbol \beta}_k)\] is a positive semi-definite matrix. Consequently, show that any linear combination of \(\hat{\boldsymbol \beta}_k\), e.g., a prediction \(\mathbf x^\top \hat{\boldsymbol \beta}_k\), has a lower variance compared to the same linear combination using the OLS estimator, i.e., \[{\mathbb{V}\operatorname{ar}}(\mathbf x^\top \hat{\boldsymbol \beta}_k)\leq {\mathbb{V}\operatorname{ar}}(\mathbf x^\top \hat{\boldsymbol \beta}^{ols}).\]
Attempt Q3c) from the 2019 paper (nb: it is a hard question).