${\mathcal M}(m,n)$ will denote the set of $m$ by $n$ matrices. $A \in {\mathcal M}(m,n)$ has elements $A_{ij}$, $i \in \{1, \ldots, m\}$, $j \in \{1, \ldots, n\}$.
We will identify ${\mathcal M}(1,n)$ and ${\mathcal M}(n,1)$ with $\Re^n$, in a deliberate abuse of notation. Whether $x \in \Re^n$ should be considered a column vector (an element of ${\mathcal M}(n,1)$) or a row vector (an element of ${\mathcal M}(1,n)$) should be clear from context or will be called out explicitly. For $a \in {\mathcal M}(1,n)$ or ${\mathcal M}(n,1)$, we generally write only one index, suppressing the index that is always equal to 1 (for instance, we would write $a_i$ in place of $a_{1i}$ or $a_{i1}$, respectively).
It is convenient to write matrices as rectangular arrays, e.g.,
$$ A = (A_{ij}) = \begin{pmatrix} A_{11} & A_{12} & \cdots & A_{1n} \\ A_{21} & A_{22} & \cdots & A_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ A_{m1} & A_{m2} & \cdots & A_{mn} \end{pmatrix} . $$The $i$th row of the matrix $A \in {\mathcal M}(m,n)$ is the matrix $B \in {\mathcal M}(1,n)$ with elements $B_j = A_{ij}$, $j=1, \ldots, n$.
The $j$th column of the matrix $A \in {\mathcal M}(m,n)$, is the matrix $C \in {\mathcal M}(m,j)$ with elements $C_i = A_{ij}$, i=1, \ldots, m$.
The transpose of a matrix interchanges the row and column indices of the original matrix. If $A \in {\mathcal M}(m,n)$ then the transpose of $A$, denoted $A'$ or $A^T$, is an element of ${\mathcal M}(m,n)$ (If $A$ is an $m$ by $n$ matrix then $A'$ is an $n$ by $m$ matrix). If $A$ has elements $A_{ij}$, then $A' = A^T$ has elements $A_{ij}^T = A_{ji}$.
Scalar multiplication. If $\alpha \in \Re$ and $A \in {\mathcal M}(m,n)$, the matrix $\alpha A \in {\mathcal M}(m,n)$ has elements $\alpha A_{ij}$, $i \in \{1, \ldots, m\}$, $j \in \{1, \ldots, n\}$. I.e.,
$$ \alpha A = \begin{pmatrix} \alpha A_{11} & \alpha A_{12} & \cdots & \alpha A_{1n} \\ \alpha A_{21} & \alpha A_{22} & \cdots & \alpha A_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \alpha A_{m1} & \alpha A_{m2} & \cdots & \alpha A_{mn} \end{pmatrix} . $$Let $-A = -1\cdot A$ be the matrix with elements $(-A)_{ij} = -A_{ij}$:
$$ -A = \begin{pmatrix} -A_{11} & -A_{12} & \cdots & -A_{1n} \\ -A_{21} & -A_{22} & \cdots & -A_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ -A_{m1} & -A_{m2} & \cdots & -A_{mn} \end{pmatrix} . $$Matrix addition. If $A, B \in {\mathcal M}(m,n)$, then the matrix $A+B \in {\mathcal M}(m,n)$ has elements $A_{ij}+B_{ij}$, $i \in \{1, \ldots, m\}$, $j \in \{1, \ldots, n\}$:
$$ A+B = \begin{pmatrix} A_{11}+B_{11} & A_{12}+B_{12} & \cdots & A_{1n}+B_{1n} \\ A_{21}+B_{21} & A_{22}+B_{22} & \cdots & A_{2n}+B_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ A_{m1}+B_{m1} & A_{m2}+B_{m2} & \cdots & A_{mn}+B_{mn} \end{pmatrix} . $$Let $0_{mn} \in {\mathcal M}(m,n)$ denote the $m$ by $n$ matrix all of whose elements are zero; i.e., $(0_{mn})_{ij} = 0$, $i \in \{1, \ldots, m\}$, $j \in \{1, \ldots, n\}$:
$$ 0_{mn} = \begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix} . $$With these definitions, ${\mathcal M}(m,n)$ is a linear vector space over the reals, with scalar identity element $1$ and vector identity element $0_{mn}$.
Transposition. The transpose of $A \in {\mathcal M}(m,n)$, $A^T = A' \in {\mathcal M}(n,m)$ has elements $(A')_{ij} = A_{ji}$, $i \in \{1, \ldots, n\}$, $j \in \{1, \ldots, m\}$:
$$ A' = \begin{pmatrix} A_{11} & A_{21} & \cdots & A_{m1} \\ A_{12} & A_{22} & \cdots & A_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \cdots & A_{mn} \end{pmatrix} . $$If $A = A'$, then $A$ is symmetric.
Let ${\mathbf 1}_{n} \in {\mathcal M}(n,n)$ denote the $n$ by $n$ identity matrix with elements $(1_{n})_{ij} = 1$, $i=j \in \{1, \ldots, n\}$, and $(1_{n})_{ij} = 0$, $i \ne j$:
$$ 1_{n} = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & 0\\ 0 & 0 & \cdots & 1 \end{pmatrix} . $$The identity matrix is an example of a diagonal matrix. A matrix $A$ is diagonal if $A_{ij}=0$ whenever $i \ne j$.
Matrix multiplication. If $A \in {\mathcal M}(m,n)$ and $B \in {\mathcal M}(n,k)$, then $AB \in {\mathcal M}(m,k)$ is the matrix with elements $(AB)_{ij} = \sum_{\ell=1}^m A_{i \ell}B_{\ell j}$:
$$ AB = \begin{pmatrix} \sum_{\ell=1}^n A_{1 \ell}B_{\ell 1} & \sum_{\ell=1}^n A_{1 \ell}B_{\ell 2} & \cdots & \sum_{\ell=1}^n A_{1 \ell}B_{\ell k} \\ \sum_{\ell=1}^n A_{2 \ell}B_{\ell 1} & \sum_{\ell=1}^n A_{2 \ell}B_{\ell 2} & \cdots & \sum_{\ell=1}^n A_{2 \ell}B_{\ell k} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{\ell=1}^n A_{m \ell}B_{\ell 1} & \sum_{\ell=1}^n A_{m \ell}B_{\ell 2} & \cdots & \sum_{\ell=1}^n A_{m \ell}B_{\ell k} \end{pmatrix} . $$If $A \in {\mathcal M}(m,n)$, then $A {\mathbf 1}_{n} = {\mathbf 1}_{m} A = A$.
A matrix $A$ is square if it has the same number of rows and columns, that is, if $A \in {\mathcal M}(n,n)$ for some $n$.
Inverses. Given a square matrix $A\in {\mathcal M}(n,n)$, $A^{-1}$ is the inverse of $A$ if $A^{-1}A = {\mathbf 1}_n$.
Not every square matrix has an inverse; the zero matrix ${\mathbf 0}_n$ is an example. If $A$ has an inverse, $A$ is invertible. A square matrix that is not invertible is singular.
Properties of inverses:
Linear Independence and Rank. Let $\{a_j \} \in \Re^n$ be a collection of vectors. Recall that $\{a_j \}$ are linearly independent if $\sum_j \alpha_j a_j = {\mathbf 0}_n$ implies that $\alpha_j = 0$ for all $j$.
The row rank of a matrix $A \in {\mathcal M}(m,n)$ is the maximal number of linearly independent rows it has; equivalently, it is the dimension of the subspace of $\Re^m$ spanned by the rows of $A$. The column rank of $A$ is to the maximal number of linearly independent columns it has; equivalently, it is the dimension of the subspace of $\Re^n$ spanned by the columns of $A$. The row rank and the column rank if any matrix are equal; we denote them ${\mbox{rank}}(A)$.
If $A \in {\mathcal M}(m,n)$ then ${\mbox{rank}}(A) \le \min(m, n)$.
If $A$ and $B$ can be multiplied, ${\mbox{rank}}(AB) \le \min({\mbox{rank}}(A), {\mbox{rank}}(B))$.
If $A$ and $B$ have the same dimensions, ${\mbox{rank}}(A+B) \le {\mbox{rank}}(A) + {\mbox{rank}}(B)$.
A square matrix $A \in {\mathcal M}(n,n)$ is invertible if and only if ${\mbox{rank}}(A) = n$.
Dot (inner) Products and the Euclidean norm.
If $a, b \in \Re^n$ then the dot product of $a$ and $b$ is $$a \cdot b = a^Tb = a'b = a_1b_1 + a_2b_2 + \cdots + a_nb_n.$$
If $a \cdot b = 0$, we say $a$ and $b$ are orthogonal, and we write $a \perp b$.
The Euclidean norm or 2-norm of $a$ is $$ \|a \|_2 = \|a \| = |a| \equiv \sqrt{a \cdot a}.$$ (The subscript 2 is often omitted for the 2-norm.)
For $1 \le p < \infty$, the $p$-norm of $a$ is $$ \|a \|_p \equiv \left ( \sum_{i=1}^n \left |a_i \right |^p \right )^{1/p}.$$
The infinity norm of $a$ is $$ \|a \|_\infty \equiv \max_{i=1}^n \left | a_i \right |.$$
Outer Products.
If $a, b \in \Re^n$ then the outer product of $a$ and $b$ is the matrix $ab \in {\mathcal M}(n,n)$ with elements $(ab)_{ij} = a_ib_j$: $$ ab = \begin{pmatrix} a_1b_1 & a_1b_2 & \cdots & a_1b_n \\ a_2b_2 & a_2b_2 & \cdots & a_2b_n \\ \vdots & \vdots & \ddots & \vdots \\ a_nb_1 & a_nb_2 & \cdots & a_nb_n \end{pmatrix} . $$
all $x \in \Re^n$, $\|Rx\|_2 = \|x\|_2$.
where $\|x\|_G \equiv \sqrt{x'Gx}$.
Let's look at all these entities and operations in R.
# Create a matrix
A <- 1:12; # A is a 1-dimensional array of the integers 1-12
dim(A) <- c(4,3); # Change the dimension of A to be 4 by 3
print("A")
A # the matrix A, which is 4 by 3
# Another way to do the same thing
A <- matrix(
+ c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), # data
nrow=4, # number of rows
ncol=3, # number of columns
byrow = FALSE) ; # read the data by column
A
# Another way
A <- matrix(
seq(1,12), # data
nrow=4, # number of rows
ncol=3, # number of columns
byrow = FALSE) ; # read the data by column
A
[1] "A"
1 | 5 | 9 |
2 | 6 | 10 |
3 | 7 | 11 |
4 | 8 | 12 |
1 | 5 | 9 |
2 | 6 | 10 |
3 | 7 | 11 |
4 | 8 | 12 |
1 | 5 | 9 |
2 | 6 | 10 |
3 | 7 | 11 |
4 | 8 | 12 |
# Transpose A
print("Transpose of A")
t(A) # the transpose of A, which is 3 by 4
A <- t(A); # overwrite A with its transpose: A is now a 3 by 4 array
A
# Another way to make this matrix
A <- matrix(
seq(1,12), # data
nrow=3, # number of rows
ncol=4, # number of columns
byrow = TRUE) ; # read the data by row
A
[1] "Transpose of A"
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
# Scalar multiplication
alpha <- 3;
A
alpha*A
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
3 | 6 | 9 | 12 |
15 | 18 | 21 | 24 |
27 | 30 | 33 | 36 |
# R understands matrix negation
-A
-1*A
-1 | -2 | -3 | -4 |
-5 | -6 | -7 | -8 |
-9 | -10 | -11 | -12 |
-1 | -2 | -3 | -4 |
-5 | -6 | -7 | -8 |
-9 | -10 | -11 | -12 |
# in R, you can create an m by n zero matrix as follows:
Z <- matrix(0, 3, 4);
Z
A + Z
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
# To create an n by n identity matrix in R, make a diagonal matrix of ones:
I = diag(1,3) # a diagonal matrix of 1s, with dimension 3 by 3
I
I2 = diag(c(1,1,1)) # a diagonal matrix with the vector (1,1,1) along the diagonal
I2
1 | 0 | 0 |
0 | 1 | 0 |
0 | 0 | 1 |
1 | 0 | 0 |
0 | 1 | 0 |
0 | 0 | 1 |
# R supports scalar addition; this is not a formal operation in linear algebra
alpha + A;
4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 |
# Matrix addition
B <- 24:13; # B is an array with the integers 24 to 13 in descending order
dim(B) = c(4,3); # B is now a 4 by 3 array
B <- t(B); # B is now a 3 by 4 array
A # 1:12
B # 24:13
A+B # a 3 by 4 array; all entries are 25
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
24 | 23 | 22 | 21 |
20 | 19 | 18 | 17 |
16 | 15 | 14 | 13 |
25 | 25 | 25 | 25 |
25 | 25 | 25 | 25 |
25 | 25 | 25 | 25 |
# Matrix multiplication
C <- 1:4;
C
A%*%C # a 3 by 2 matrix
cat("the (3,1) entry should be", 9*1+10*2+11*3+12*4) # Check the (3,1) entry
30 |
70 |
110 |
the (3,1) entry should be 110
# now multiplying a matrix by a matrix The matrix multiplication operator is %*%
A %*% t(B) # a 3 by 3 matrix
t(A) %*% B # a 4 by 4 matrix
220 | 180 | 140 |
580 | 476 | 372 |
940 | 772 | 604 |
268 | 253 | 238 | 223 |
328 | 310 | 292 | 274 |
388 | 367 | 346 | 325 |
448 | 424 | 400 | 376 |
# Multiplication by the identity matrix
I3 <- diag(1,3); # I3 is the 3 by 3 identity matrix
I3 %*% A # multiplying from the left by the 3 by 3 identity matrix gives us back A
#
I4 <- diag(1,4); # I4 is the 4 by 4 identity matrix
A %*% I4 # multiplying from the right by the 4 by 4 identity matrix gives us back A
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
# R also does elementwise multiplication; again, this is not a formal operation in linear algebra
A*C # R copies C repeatedly to fill out the dimension of A,
# then multiplies elements of A by the corresponding element
1 | 8 | 9 | 8 |
10 | 6 | 28 | 24 |
27 | 20 | 11 | 48 |
# The dot product
a <- 1:5;
b <- c(rep(1,3), rep(2,2)); # new function: define a vector by repeating values
b
t(a) %*% b
# another way to do it
sum(a*b)
24 |
# the 2-norm
norm(as.matrix(a), type="2") # the "norm" function operates on matrices--not lists.
# The function "as.matrix()" turns the list a into a matrix.
# alternative 1
sqrt(sum(a*a))
# alternative 2
sqrt(t(a) %*% a)
# check:
sqrt(1^2+2^2+3^2+4^2+5^2)
7.416198 |
# the 1-norm
norm(as.matrix(a), type="o")
# alternative
sum(abs(a))
# check:
1+2+3+4+5
The notation $X \sim P$ means that the random variable $X$ has the probability distribution $P$. Recall that $U[a, b]$ denotes the uniform distribution on the interval $[a, b]$.
Recall that the R function runif() generates uniformly distributed pseudo-random numbers between 0 and 1. To generate pseudo-random numbers that are uniformly distributed on the interval $[a, b]$, we can re-scale numbers from the interval $[0,1]$ as follows:
If $X \sim U[0,1]$, then $a+(b-a)X \sim U[a,b]$.
Example: to generate pseudo-random numbers on the interval $[-1, 2]$, we can generate pseudo-random numbers on the interval $[0,1]$, multiply them by 3, and subtract 1 from the result.
The R function runif() takes options min and max to set the range in this way.
# Here's how to plot a histogram in R. You will need this for the exercise below.
x <- runif(1000, min=-1, max=2) # generate pseudorandom numbers uniformly distributed between -1 and 2.
hist(x) # plot a histogram of the results
- Prove algebraically that $A(\theta)$ is an orthogonal matrix.
- Write an R script that checks whether $A(\theta)$ is orthogonal by computing $A(\theta)A'(\theta)$ numerically for 1,000 values of $\theta$ selected at random, uniformly from $[0, 2\pi]$:
- For each randomly selected value of $\theta$, calculate $\sum_{i=1}^2 \sum_{j=1}^2 \left ( (A(\theta)A'(\theta))_{ij} - ({\mathbf 1}_2)_{ij} \right )^2$. (Note: this is the square of the Frobenius matrix norm of the difference between $AA'$ and $\mathbf 1$. You can code it from scratch or use the R function <tt>norm()</tt> by setting <tt>type="F"</tt> in the call to <tt>norm</tt>.)
- Find the maximum value of those 1,000 sums of squares.
- Plot a histogram of those 1,000 sums of squares.
If $A \in {\mathcal M}(n,n)$, the trace of $A$ is the sum of the diagonal elements of $A$: $$ {\mbox{trace}}(A) = \sum_{i=1}^n A_{ii}.$$
The determinant of a square matrix is a real number; how to compute the determinant of a matrix is described below.
Properties of determinants:
The determinant of a matrix $A \in {\mathcal M}(2,2)$ is $$ |A| = A_{11}A_{22} - A_{12}A_{21}. $$
The determinant of a matrix $A \in {\mathcal M}(3,3)$ is $$ |A| = A_{11}(A_{22}A_{33} - A_{23}A_{32})
\left |A \right | \equiv \sum_{\pi \in S_n} \mbox{sgn}(\pi) \prod_{i=1}^n A_{i,\pi_i}. $$
If $A \in {\mathcal M}(n,n)$ is invertible, its inverse is $$ A^{-1} = \frac{\mbox{adj}(A)}{|A|}, $$ where $\mbox{adj}(A)$ is the adjugate of $A$, the transpose of the matrix of co-factors. The $(i,j)$ element of the co-factor matrix of $A$ is the determinant of the matrix that results from deleting the $i$th row and $j$th column of $A$, multiplied by $(-1)^{i+j}$.
For instance, if $A \in {\mathcal M}(2,2)$, its adjugate is $$ \mbox{adj}(A) \equiv \begin{pmatrix} A_{22} & -A_{12} \\ -A_{21} & A_{11} \end{pmatrix} . $$ If $A \in {\mathcal M}(3,3)$, its adjugate is $$ \mbox{adj}(A) \equiv \begin{pmatrix}
\end{vmatrix} &
\end{vmatrix} &
\end{vmatrix} \
\end{vmatrix} &
\end{vmatrix} &
\end{vmatrix} \
\end{vmatrix} &
\end{vmatrix} &
\end{vmatrix} \end{pmatrix} . $$
Let $$ A = \begin{pmatrix} 1 & 2 \\ 6 & 7 \end{pmatrix} . $$ Then $$ \mbox{adj}(A) = \begin{pmatrix} 7 & -2 \\ -6 & 1 \end{pmatrix} , $$ and $|A| = 1\cdot7 - 6\cdot2 = 7-12 = -5 \ne 0$, so $A$ is invertible. $$ A^{-1} = \frac{\mbox{adj}(A)}{|A|} = -\frac{1}{5} \begin{pmatrix} 7 & -2 \\ -6 & 1 \end{pmatrix} . $$
Let's check the result: $$ AA^{-1} = \begin{pmatrix} 1 & 2 \\ 6 & 7 \end{pmatrix} \cdot -\frac{1}{5} \begin{pmatrix} 7 & -2 \\ -6 & 1 \end{pmatrix} = \frac{1}{5}\begin{pmatrix} -7+12 & 2-2 \\ -42 + 42 & 12-7 \end{pmatrix} = \frac{1}{5} \begin{pmatrix} 5 & 0 \\ 0 & 5 \end{pmatrix} = {\mathbf 1}_2. $$
Let's compute another: take $$ A = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 3 & 4 & 5 \end{pmatrix} . $$ Then $$ \left |A \right | = 1 (4 \cdot 5 - 4 \cdot 6) -2(2 \cdot 5 - 3 \cdot 6) + 3(2 \cdot 4 - 3 \cdot 4) = -4 + 16 - 12 = 0. $$ Hence, $A$ is not full rank and is not invertible. (The second row is a multiple of the first row, so the rows are linearly dependent.)
# Trace in R
A <- array(1:5,dim=c(5,5))
A
sum(diag(A)) # diag() extracts the diagonal of a matrix; sum() adds the elements of a list
1 | 1 | 1 | 1 | 1 |
2 | 2 | 2 | 2 | 2 |
3 | 3 | 3 | 3 | 3 |
4 | 4 | 4 | 4 | 4 |
5 | 5 | 5 | 5 | 5 |
# Determinant
A <- array(1:5,dim=c(5,5))
A # the rows and columns are multiples of each other, so A is singular (not invertible)
det(A) # 0
# The set of singular matrices has Lebesgue measure 0, so a random matrix is nonsingular with probability 1
A <- array(runif(25), dim=c(5,5)) # 5 by 5 matrix with uniformly distributed elements
A
det(A)
1 | 1 | 1 | 1 | 1 |
2 | 2 | 2 | 2 | 2 |
3 | 3 | 3 | 3 | 3 |
4 | 4 | 4 | 4 | 4 |
5 | 5 | 5 | 5 | 5 |
0.44494921 | 0.52317376 | 0.03632915 | 0.32502890 | 0.11407382 |
0.1517755 | 0.8576347 | 0.6506324 | 0.7805578 | 0.1062984 |
0.7462715 | 0.4619980 | 0.1625752 | 0.6339368 | 0.6016028 |
0.60668472 | 0.64195808 | 0.09207884 | 0.71420528 | 0.62334836 |
0.79192360 | 0.03372087 | 0.35195751 | 0.07350370 | 0.58952493 |
# Inverse
solve(A) # this is almost never the right thing to do
1.6532796 | -0.3517214 | 3.0738478 | -3.0826031 | -0.1338509 |
1.86271860 | 0.08329378 | -5.52354965 | 3.84048152 | 1.20043061 |
-0.7561739 | 1.1796194 | -0.2336376 | -0.8476521 | 1.0683307 |
-1.5079462 | 0.3122524 | 6.2728416 | -3.5774260 | -2.3831924 |
-1.6879742 | -0.2754759 | -4.4558594 | 4.8733703 | 1.4667515 |
Let $A \in {\mathcal M}(n,n)$ be a square matrix. An eigenvector of $A$ is a non-zero vector $v \in \Re^n$ such that, for some scalar $\lambda \in \Re$, $\lambda \ne 0$, $Av = \lambda v$. That is, the effect of multiplying $v$ by the matrix $A$ is to "stretch or shrink" $v$ by the scalar $\lambda$, without changing the direction of $v$ (except perhaps to change its sign). The scalar $\lambda$ is the eigenvalue associated with the eigenvector $v$.
If $A$ is a diagonal matrix, its eigenvectors are the coordinate vectors $e_1 = (1, 0, \ldots, 0)$, $e_2 = (0, 1, \ldots, 0)$, $e_n = (0, 0, \ldots, 1)$, and the associated eigenvalues are the diagonal elements of $A$. That is, the eigenvalue associated with the eigenvector $e_j$ is $A_{jj}$.
If $Av = \lambda v$, then $(A- \lambda {\mathbf 1}_n)v = 0$. For that to occur with $v \ne {\mathbf 0}_n$, it is necessary that the determinant of $A - \lambda {\mathbf 1}_n$ be zero. That condition, $\left |A - \lambda {\mathbf 1}_n \right | = 0$ means that any eigenvalue must be a root of an $n$th degree polynomial in $\lambda$ called the characteristic polynomial of the matrix $A$. (The leading term in the polynomial is $(-1)^n \lambda^n$.) It follows that $A$ has at most $n$ real eigenvalues and always has $n$ complex eigenvalues (counting multiple roots as often as they occur).
Suppose that $A$ has the $n$ complex eigenvalues $\{\lambda_i \}$ (counting multiples as often as they occur).
Then:
$\{\lambda_i^{-1}\}$.
Let $V \in {\mathcal M}(n,n)$ be the matrix whose columns are the eigenvectors of a matrix $A$, and let $\Lambda$ be the diagonal matrix of the corresponding eigenvalues of $A$. Then $A = V \Lambda V^{-1}$. This factorization is the eigen-decomposition of $A$.
Suppose $A$ is positive semi-definite. Then $A$ can be written $$ A = R \Lambda R', $$ where $R$ is an orthogonal matrix and $\Lambda$ is the diagonal matrix whose diagonal elements are the eigenvalues of $A$, all of which are non-negative. The columns of $R$ are the eigenvectors of $A$. If $A$ is positive definite, then the diagonal elements of $\Lambda$ are strictly positive.
Suppose that $\Lambda$ is a diagonal matrix with all non-negative elements.
Define the matrix $\Lambda^{1/2}$ by $(\Lambda^{1/2})_{ij} = \sqrt{\Lambda_{ij}}$: $$ \Lambda^{1/2} \equiv \begin{pmatrix} \sqrt{\Lambda_{11}} & 0 & \cdots & 0 \\ 0 & \sqrt{\Lambda_{22}} & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & \sqrt{\Lambda_{nn}} \end{pmatrix} . $$
Then $\Lambda^{1/2}$ is the matrix square root of $\Lambda$, in the sense that $\Lambda^{1/2}\Lambda^{1/2} = \Lambda$.
Suppose $A \in {\mathcal M}(n,n)$ is positive semi-definite, with the eigen-decomposition $R\Lambda R'$, $R$ orthogonal, $\Lambda$ diagonal with non-negative elements. Define the matrix $$ A^{1/2} \equiv R \Lambda^{1/2} R'. $$ Then $A^{1/2}$ is the matrix square root of $A$, in the sense that $A^{1/2}A^{1/2} = A$.
Suppose that $\Lambda \in {\mathcal M}(n,n)$ is a diagonal matrix with strictly positive diagonal elements (i.e., $\Lambda$ is a diagonal, positive definite matrix). Then the inverse of $\Lambda$ has elements $$ (\Lambda^{-1})_{ij} = \left \{ \begin{array}{ll} 0, & i \ne j \\ \Lambda_{ij}^{-1}, & i=j. \end{array} \right . $$
We can verify this by direct multiplication: $$ (\Lambda \Lambda^{-1})_{ij} = \sum_{\ell=1}^n \Lambda_{i\ell} (\Lambda^{-1})_{\ell j} = \left \{ \begin{array}{ll} 0, & i \ne j \\ 1, & i=j. \end{array} \right . $$
Suppose that $A \in {\mathcal M}(n,n)$ is positive definite, with the eigen-decomposition $R\Lambda R'$. Then $A^{-1} = R \Lambda^{-1} R'$.
the matrix with elements $(\Lambda^{1/2})_{ij} = \sqrt{\Lambda_{ij}}$ is its matrix square root.
$A^{1/2} \equiv R\Lambda^{1/2}R'$ is the matrix square root of $A$.
# Eigen decompositions in R
A <- array(c(1,1,1,1,2,3,0,1,1), dim=c(3,3));
eigen(A)
# eigen() is an example of a function that returns more than one variable. Here's how you access them:
decomp <- eigen(A);
decomp$values
decomp$vectors
-0.2268408 | -0.8167856 | 0.3520763 |
-0.6013762 | 0.2237099 | -0.4848805 |
-0.7660874 | 0.5318037 | 0.8005830 |
-0.2268408 | -0.8167856 | 0.3520763 |
-0.6013762 | 0.2237099 | -0.4848805 |
-0.7660874 | 0.5318037 | 0.8005830 |
# let's check whether these really are eigenvectors!
ev1 <- decomp$vectors[,1]; # the first column of the eigenvector matrix: an eigenvector
ev1
A %*% ev1 / decomp$values[1] # this should give us back the first eigenvector
-0.2268408 |
-0.6013762 |
-0.7660874 |
# check that the eigen-decomposition of A actually holds
A
decomp$vectors %*% diag(decomp$values) %*% solve(decomp$vectors)
1 | 1 | 0 |
1 | 2 | 1 |
1 | 3 | 1 |
1.000000e+00 | 1.000000e+00 | -5.551115e-16 |
1 | 2 | 1 |
1 | 3 | 1 |
# check that the sum of the eigenvalues is the trace
sum(diag(A))
sum(decomp$values)
# check that the product of the eigenvalues is the determinant
det(A)
prod(decomp$values)
# check whether we can invert A using its eigen-decomposition
Ainv <- decomp$vectors %*% diag(1/decomp$values) %*% solve(decomp$vectors);
Ainv
A %*% Ainv # this should be the identity matrix
1 | 1 | -1 |
1.026956e-15 | -1.000000e+00 | 1.000000e+00 |
-1 | 2 | -1 |
1.000000e+00 | -6.661338e-16 | 1.221245e-15 |
-1.110223e-16 | 1.000000e+00 | -4.440892e-16 |
8.881784e-16 | 1.110223e-15 | 1.000000e+00 |
# this matrix is not positive-definite, so the eigenvector matrix is not orthogonal: its
# transpose is not its inverse. Let's verify that:
t(decomp$vectors)
solve(decomp$vectors)
-0.2268408 | -0.6013762 | -0.7660874 |
-0.8167856 | 0.2237099 | 0.5318037 |
0.3520763 | -0.4848805 | 0.8005830 |
-0.5152664 | -0.9918796 | -0.3741398 |
-1.0057615 | -0.1039075 | 0.3793761 |
0.1750332 | -0.8801187 | 0.6390625 |
# Now let's try it with a positive-definite matrix. We can make a positive definite matrix from a nonsingular matrix
Apd <- t(A) %*% A;
Apd
decomp <- eigen(Apd);
decomp$values # the eigenvalues should be non-negative
decomp$vectors %*% t(decomp$vectors) # and the matrix of eigenvectors should be orthogonal
3 | 6 | 2 |
6 | 14 | 5 |
2 | 5 | 2 |
1.000000e+00 | -1.110223e-16 | -1.665335e-16 |
-1.110223e-16 | 1.000000e+00 | -5.551115e-17 |
-1.387779e-16 | -5.551115e-17 | 1.000000e+00 |
# the eigenvector matrix should be orthogonal. Let's check:
t(decomp$vectors)
solve(decomp$vectors) # should be equal to the transpose
0.3797058 | 0.8709960 | 0.3117524 |
0.8226923 | -0.1638011 | -0.5443773 |
0.4230850 | -0.4631795 | 0.7787579 |
0.3797058 | 0.8709960 | 0.3117524 |
0.8226923 | -0.1638011 | -0.5443773 |
0.4230850 | -0.4631795 | 0.7787579 |
# lets make the matrix square root of Apd
ApdHalf <- decomp$vectors %*% diag(sqrt(decomp$values)) %*% t(decomp$vectors);
ApdHalf
ApdHalf %*% ApdHalf
1.148516 | 1.259494 | 0.307545 |
1.259494 | 3.345303 | 1.105722 |
0.3075450 | 1.1057220 | 0.8263141 |
3 | 6 | 2 |
6 | 14 | 5 |
2 | 5 | 2 |
We will often need to solve linear equations of the form $Ax = b$, where $A \in {\mathcal M}(m,n)$ and $b \in {\mathcal M}(1,m)$ are known, and $x \in {\mathcal M}(n,1)$ is the unknown vector we seek.
If the rows of $A$ are linearly independent and $n \ge m$, there is at least one solution. If $n < m$, there may be no solution.
For the moment, suppose $A$ is an invertible square matrix with dimension $n$. Then, formally, we can find $x$ by multiplying both sides by the inverse of $A$:
$$ A^{-1} A x = A^{-1} b. $$Since $A^{-1}A = {\mathbf 1}_n$, this becomes $$ {\mathbf 1}_n x = x = A^{-1} b. $$
However, this is not a good way to find $x$: it is unnecessarily numerically unstable (ill conditioned) compared with other ways of solving the linear system.
Better methods include matrix factorizations or decompositions (LU, QR, Cholesky), Gaussian elimination, etc. We won't study those methods in detail, but will present Gaussian elimination briefly.
Gaussian elimination. Gaussian elimination is a method that can be used to find the rank of a matrix and to solve a linear system. Gaussian elimination involves three operations:
All of those operations preserves the linear system: a vector $x$ solves the original system if and only if it solves the transformed system.
The goal of Gaussian elimination is to use those operations to transform $A$ into upper triangular form, in which all elements "below" the diagonal are zero. More specifically, it transforms to row eschelon form, in which the first non-zero element of each row is to the right of the first non-zero element of the row above it. It is trivial to solve a linear system (using back substitution) once it is in upper-triangular form, as we shall see.
Here's an illustration. Let $$ A = \begin{pmatrix} 2 & 4 & 4 \\ 6 & 1 & 5 \\ 3 & 4 & 3 \end{pmatrix} $$ and $$ b = \begin{pmatrix} 2 \\ 1 \\ 1 \end{pmatrix}. $$
We start by forming the "augmented" matrix that consists of appending $b$ as an additional column of $A$: $$ \begin{pmatrix} 2 & 4 & 4 & 2\\ 6 & 1 & 5 & 1\\ 3 & 4 & 3 & 1 \end{pmatrix} $$ Divide the first row by 2 $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 6 & 1 & 5 & 1\\ 3 & 4 & 3 & 1 \end{pmatrix}. $$ Subtract 6 times the first row from the second row: $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 0 & -11 & -7 & -5\\ 3 & 4 & 3 & 1 \end{pmatrix}. $$ Subtract three times the first row from the third row: $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 0 & -11 & -7 & -5\\ 0 & -2 & -3 & -2 \end{pmatrix}. $$ Multiply the second and third rows by -1 and swap them: $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 0 & 2 & 3 & 2 \\ 0 & 11 & 7 & 5 \end{pmatrix}. $$ Divide the second row by 2: $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 0 & 1 & 1.5 & 1 \\ 0 & 11 & 7 & 5 \end{pmatrix}. $$ Subtract 11 times the second row from the third row: $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 0 & 1 & 1.5 & 1 \\ 0 & 0 & -9.5 & -6 \end{pmatrix}. $$ Multiply the last row by -1/9.5: $$ \begin{pmatrix} 1 & 2 & 2 & 1\\ 0 & 1 & 1.5 & 1 \\ 0 & 0 & 1 & 0.63158 \end{pmatrix}. $$ We now use back substitution to find $x$:
This gives $$ x = \begin{pmatrix} -0.36842 \\ 0.05263 \\ 0.63158 \end{pmatrix} . $$ Multiplying by $A$ gives $$ Ax = \begin{pmatrix} 2 & 3 & 4 \\ 6 & 1 & 5 \\ 3 & 4 & 3 \end{pmatrix} \begin{pmatrix} -0.36842 \\ 0.05263 \\ 0.63158 \end{pmatrix} = \begin{pmatrix} 2 \\ 1.00001 \\ 1 \end{pmatrix} . $$
# Let's try this in R
A <- array(c(2,6,3,4,1,4,4,5,3), dim=c(3,3));
A
b <- c(2,1,1);
b
solve(A,b)
2 | 4 | 4 |
6 | 1 | 5 |
3 | 4 | 3 |
What happens if $A$ is rank deficient?
Consider the matrix $$ A = \begin{pmatrix} 2 & 4 & 4 \\ 6 & 1 & 5 \\ 10 & 9 & 13 \end{pmatrix} . $$ Let's perform Gaussian elimination on it. Divide the first row by 2 $$ \begin{pmatrix} 1 & 2 & 2 \\ 6 & 1 & 5 \\ 10 & 9 & 13 \end{pmatrix}. $$ Subtract 6 times the first row from the second row: $$ \begin{pmatrix} 1 & 2 & 2 \\ 0 & -11 & -7 \\ 10 & 9 & 13 \end{pmatrix}. $$ Subtract ten times the first row from the third row: $$ \begin{pmatrix} 1 & 2 & 2 \\ 0 & -11 & -7\\ 0 & -11 & -7 \end{pmatrix}. $$ Subtract the 2nd row from the third row: $$ \begin{pmatrix} 1 & 2 & 2 \\ 0 & -11 & -7\\ 0 & 0 & 0 \end{pmatrix}. $$ The third row is now zero: there are only two nonzero rows once we transform to upper-triangular form. The rank of the matrix $A$ is therefore 2, not 3 (it is rank deficient). This matrix $A$ is not invertible.
# solve a linear system using R
A <- array(runif(25), dim=c(5,5)) # 5 by 5 matrix with uniformly distributed elements
b <- rep(1,5);
x <- solve(A,b);
x
A %*% x # check that x solves the linear system
1 |
1 |
1 |
1 |
1 |
Ainv = solve(A);
Ainv
x <- Ainv %*% b;
x
A %*% x
4.992164 | 2.381509 | 5.862320 | -6.310363 | -3.638844 |
0.282778 | -1.400723 | -1.985402 | 1.730754 | 1.686464 |
-4.7926526 | -1.1627247 | -2.8455445 | 4.9559795 | 0.8199406 |
0.7890497 | -0.3361745 | 0.2298502 | -0.7290643 | 1.4076452 |
-0.008949328 | 1.184794406 | -0.242017138 | 0.032201496 | -0.315967880 |
3.286787 |
0.313871 |
-3.025002 |
1.361306 |
0.6500616 |
1 |
1 |
1 |
1 |
1 |