2.2 Vector spaces

It will be useful to talk about vector spaces. These are sets of vectors that can be added together, or multiplied by a scalar. You should be familiar with these from your undergraduate degree. We don’t provide a formal definition here, but you can think of a real vector space \(V\) as a set of vectors such that for any \(\mathbf v_1, \mathbf v_2 \in V\) and \(\alpha_1, \alpha_2 \in \mathbb{R}\), we have \[\alpha_1 \mathbf v_1 + \alpha_2 \mathbf v_2 \in V\] i.e., vector spaces are closed under addition and scalar multiplication.

Example 2.5 Euclidean space in \(p\) dimensions, \(\mathbb{R}^p\), is a vector space. If we add any two vectors in \(\mathbb{R}^p,\) or multiply a vector by a real scalar, then the resulting vector also lies in \(\mathbb{R}^p\).

A subset \(U \subset V\) of a vector space \(V\) is called a vector subspace if \(U\) is also a vector space.

Example 2.6 Let \(V=\mathbb{R}^2\). Then the sets \[U_1 = \left\{\left( \begin{array}{c} a\\ 0 \end{array} \right): a\in \mathbb{R}\right\}, \mbox{ and}\quad U_2 = \left\{a\left( \begin{array}{c} 1 \\ 1 \end{array} \right): a\in \mathbb{R}\right\}\] are both subspaces of \(V\).

2.2.1 Linear independence

Definition 2.3 Vectors \(\stackrel{n\times 1}{\mathbf x}_1 ,\dots , \stackrel{n\times 1}{\mathbf x}_p\) are said to be linearly dependent if there exist scalars \(\lambda _1, \dots ,\lambda _p\) not all zero such that \[ \lambda _1 {\mathbf x}_1+\lambda _2 {\mathbf x}_2+ \dots + \lambda _p {\mathbf x}_p={\mathbf 0}.\] Otherwise, these vectors are said to be linearly independent.

Definition 2.4 Given a set of vectors \(S=\{s_1, \ldots, s_n\}\), the span of \(S\) is the smallest vector space containing \(S\) or equivalently, is the set of all linear combinations of vectors from \(S\) \[\operatorname{span}(S) = \left\{ \sum_{i=1}^k \alpha_i s_i \mid k \in \mathbb{N}, \alpha_i \in \mathbb{R}, s_i \in S\right\}\]

Definition 2.5 A basis of a vector space \(V\) is a set of linearly independent vectors in \(V\) that span \(V\).

Example 2.7 Consider \(V=\mathbb{R}^2\). Then the following are both bases for \(V\): \[B_1=\left\{\left(\begin{array}{c}1\\0\end{array}\right), \left(\begin{array}{c}0\\1\end{array}\right)\right\} \] \[B_2=\left\{\left(\begin{array}{c}1\\1\end{array}\right), \left(\begin{array}{c}1\\2\end{array}\right)\right\} \]

Definition 2.6 The dimension of a vector space is the number of vectors in its basis.

2.2.2 Row and column spaces

We can think about the matrix-vector multiplication \(\mathbf A\mathbf x\) in two ways. The usual way is as the inner product between the rows of \(A\) and \(x\).

\[ \left( \begin{array}{cc} 1 & 2\\ 3&4\\5&6\end{array}\right) \left(\begin{array}{c}x_1\\ x_2\end{array}\right) = \left(\begin{array}{c} x_1+2x_2\\3x_1+4x_2\\5x_1+6x_2\end{array}\right)\]

But a better way to think of \(\mathbf A\mathbf x\) is as a linear combination of the columns of \(A\).

\[ \left( \begin{array}{cc} 1 & 2\\ 3&4\\5&6\end{array}\right) \left(\begin{array}{c}x_1\\ x_2\end{array}\right) = x_1\left(\begin{array}{c}1\\3\\5 \end{array}\right)+x_2\left(\begin{array}{c}2\\4\\6 \end{array}\right)\]

Definition 2.7 The column space of a \(n\times p\) matrix \(\mathbf A\) is the set of all linear combinations of the columns of \(\mathbf A\): \[\mathcal{C}(\mathbf A) = \{\mathbf A\mathbf x: \mathbf x\in \mathbb{R}^p\}\subset \mathbb{R}^n\]

For \[A=\left( \begin{array}{cc} 1 & 2\\ 3&4\\5&6\end{array}\right) \] we can see that the column space is a 2-dimensional plane in \(\mathbb{R}^3\). The matrix \(\mathbf B\) has the same column space as \(\mathbf A\) \[\mathbf B=\left( \begin{array}{cccc} 1 & 2&3 &4\\ 3&4 &7&10\\5&6&11&16\end{array}\right) \]

The number of linearly independent columns of \(\mathbf A\) is called the column rank of \(\mathbf A\), and is equal to the dimension of the column space of \(\mathcal{C}(\mathbf A)\). The column rank of \(\mathbf A\) and \(\mathbf B\) is 2.

The row space of \(\mathbf A\) is defined to be the column space of \(\mathbf A^\top\), and the row rank is the number of linearly independent rows of \(\mathbf A\).

Theorem 2.1 The row rank of a matrix equals the column rank.

Thus we can simply refer to the rank of the matrix.

Proof. The proof of this theorem is very simple. Let \(\mathbf C\) be an \(n \times r\) matrix (where \(r=\operatorname{rank}(\mathbf A)\)) with columns chosen to be a set of \(r\) linearly independent columns from \(A\). Then we know each column of \(\mathbf A\) can be written as a linear combination of the columns of \(\mathbf C\), i.e. \[\mathbf A= \mathbf C\mathbf R.\] The dimension of \(\mathbf R\) must be \(r \times p\). But now we can see that the rows of \(\mathbf A\) are formed by a linear combination of the rows of \(\mathbf R\). Thus the row rank of \(\mathbf A\) is at most \(r\) (=the column rank of \(\mathbf A\)). This holds for any matrix, so is true for \(\mathbf A^\top\): namely \(\operatorname{row-rank}(A^\top)\leq \operatorname{column-rank}(A^\top)\). But the row space of \(\mathbf A^\top\) equals \(\mathcal{C}(\mathbf A)\), thus proving the theorem!

Corollary 2.1 The rank of an \(n\times p\) matrix is at most \(\min(n,p)\).

Example 2.8 \[B = \left( \begin{array}{cccc} 1 & 2\\ 3&4 \\5&6\end{array}\right)\left(\begin{array}{cccc}1&0&1&2\\0&1&1&1\end{array}\right) \]

Example 2.9 \[ D=\left( \begin{array}{ccc} 1 & 2&3\\ 2&4&6 \end{array}\right)= \left( \begin{array}{c} 1 \\ 2 \end{array}\right)\left(\begin{array}{ccc}1&2&3\end{array}\right) \] So the rank of \(D\) is \(1\).

2.2.3 Linear transformations

We can view an \(n\times p\) matrix \(\mathbf A\) as a linear map between two vector spaces: \[\begin{align*} \mathbf A: \;\mathbb{R}^p &\rightarrow \mathbb{R}^n\\ \mathbf x&\mapsto \mathbf A\mathbf x \end{align*}\]

The image of \(\mathbf A\) is precisely the column space of \(\mathbf A\): \[\operatorname{Im}(\mathbf A) = \{\mathbf A\mathbf x: \mathbf x\in \mathbb{R}^p\}=\mathcal{C}(\mathbf A) \subset \mathbb{R}^n\]

The kernel of \(A\) is the set of vectors mapped to zero: \[\operatorname{Ker}(\mathbf A)=\{\mathbf x: \mathbf A\mathbf x=\boldsymbol 0\}\subset \mathbb{R}^p\] and is sometimes called the null-space of \(\mathbf A\) and denoted \(\mathcal{N}(\mathbf A)\).

Theorem 2.2 The rank-nullity theorem says if \(V\) and \(W\) are vector spaces, and \(A: V\rightarrow W\) is a linear map, then \[\dim \operatorname{Im}(A)+\dim \operatorname{Ker}(A) =\dim V\]

If we’re thinking about matrices, then \(\dim \mathcal{C}(\mathbf A)+\dim \mathcal{N}(\mathbf A)=p\), or equivalently that
\(\operatorname{rank}(\mathbf A)+\dim \mathcal{N}(\mathbf A)=p\).

We’ve already said that the row space of \(\mathbf A\) is \(\mathcal{C}(\mathbf A^\top)\). The left-null space is \(\{\mathbf x\in \mathbb{R}^n: \mathbf x^\top \mathbf A=0\}\) or equivalently \(\{x \in \mathbb{R}^n: \mathbf A^\top \mathbf x=0\}=\mathcal{N}(\mathbf A^\top)\). And so by the rank-nullity theorem we must have \[n=\dim \mathcal{C}(\mathbf A^\top) + \dim \mathcal{N}(\mathbf A^\top)= \operatorname{rank}(\mathbf A)+\dim \operatorname{Ker}(\mathbf A^\top).\]

Example 2.10 Consider again the matrix \(D: \mathbb{R}^3\rightarrow \mathbb{R}^2\) \[ D=\left( \begin{array}{ccc} 1 & 2&3\\ 2&4&6 \end{array}\right)= \left( \begin{array}{c} 1 \\ 2 \end{array}\right)\left(\begin{array}{ccc}1&2&3\end{array}\right) \] We have already seen that \[\mathcal{C}(D)=\operatorname{span}\left\{\left(\begin{array}{c}1\\2\end{array}\right)\right\}\] and so \(\dim \mathcal{C}(D)=\operatorname{rank}(D)=1\). The kernel, or null-space, of \(\mathbf D\) is the set of vectors for which \(\mathbf D\mathbf x=\boldsymbol 0\), i.e., \[x_1+2x_2+3x_3=0\] This is a single equation with three unknowns, and so there must be a plane of solutions. We need two linearly independent vectors in this plane to describe it. Convince yourself that \[\mathcal{N}(D) = \operatorname{span}\left\{\left(\begin{array}{c}0\\3\\-2\end{array}\right), \left(\begin{array}{c}2\\-1\\0\end{array}\right)\right\}\] So we have \[\dim \mathcal{C}(D)+\dim \mathcal{N}(D)=1+2=3\] as required by the rank-nullity theorem.

If we consider \(D^\top\), we already know \(\dim \mathcal{C}(D^\top)=1\) (as row-rank=column rank), and the rank-nullity theorem tells us that the dimension of the null space of \(D^\top\) must be \(2-1=1\). This is easy to confirm as \(D^\top x=0\) implies \[x_1+2x_2=0\] which is a line in \(\mathbb{R}^2\) \[\mathcal{N}(D^\top) = \operatorname{span}\left\{ \left(\begin{array}{c}-2\\1\end{array}\right)\right\}\]

Question: When does a square matrix \(\mathbf A\) have an inverse?

Precisely when the kernel of \(\mathbf A\) contains only the zero vector, i.e., has dimension 0. In this case the column space of \(\mathbf A\) is the original space, and \(\mathbf A\) is surjective and so must have an inverse. A simpler way to determine if \(\mathbf A\) has an inverse is to consider its determinant.

Question: Suppose we are given a \(n\times p\) matrix \(\mathbf A\), and a n-vector \(\mathbf y\). When does \[\mathbf A\mathbf x= \mathbf y\] have a solution?

When \(\mathbf y\) is in the column space of \(\mathbf A\), \[\mathbf y\in \mathcal{C}(\mathbf A)\]

Question: When is the answer unique?

Suppose \(\mathbf x\) and \(\mathbf x'\) are both solutions with \(\mathbf x\not =\mathbf x'\). We can write \(\mathbf x'=\mathbf x+\mathbf u\) for some vector \(\mathbf u\) and note that \[\mathbf y=\mathbf A\mathbf x' = \mathbf A\mathbf x+\mathbf A\mathbf u= \mathbf y+\mathbf A\mathbf u\] and so \(\mathbf A\mathbf u=\boldsymbol 0\), i.e., \(\mathbf u\in \mathcal{N}(A)\). So there are multiple solutions when the null-space of \(\mathbf A\) contains more than the zero vector. If the dimension of \(\mathcal{N}(A)\) is one, there is a line of solutions. If the dimension is two, there is a plane of solutions, etc.