Chapter 1. Vector Spaces
1.5 Gaussian Elimination
I-Liang Chern
National Taiwan University
Fall, 2021
Outline
1
1.5 Gaussian elimination
Elimination as a reduction process
Geometric interpretation of the Gaussian elimination-1
Fundamental theorem of linear algebra
Gaussian elimination
We solve the equation
Ax = b
by Gaussian elimination. It is to change the equations to a set ofequivalentyetsimpler equations. In terms of matrix, it is a sequence of row operations on the row vectors of A (or the augmented matrix [A|b]). A row operation is to replace a row Aiby a new row A0i.
There are three kinds of row operations:
(1) scaling: Ai αAi, α 6= 0, (2) swapping: Ai↔ Aj
(3) shearing: A0i= Ai− αAj, α 6= 0.
The Gaussian elimination process is divided into two parts:
I Forward elimination
I Backward substitution
The resulting matrix after forward elimination is called a matrix in echelon form (see the matrix U below), while the resulting matrix after backward substitution is called a reduced echelon form (see the matrix C below).
U=
×
× × × × ×
×
× × ×
×
× ×
0
× ××
, C=
1 × 0 0 0 ×
1 0 0 ×
1 0 ×
0
1 ××
I Echelon form: each row is either zero or has a nonzero starting entry, called the pivot entry (marked by × ); the entries below pivot entry are all zeros.
I Reduced echelon form: each pivot entry is normalized to be 1; all entries above or below the pivot entry are zeros.
The advantage of the reduced echelon form is that we can construct a basis in R(AT)and a basis in N(A) easily.
In matlab, the command [R,p] = rref(A) returns the reduced row echelon matrix and the nonzero pivots p.
Examples
(a) Consider the system
(x1− 3x2 + x4= 1 x3+ 2x4= 3
The variables x1and x3are the pivot variables, while x2, x4, the free variables. We can express x1
and x3in terms of x2and x4as
x3= 3 − 2x4, x1= 1 − 3x2− x4.
In vector form:
x1 x2 x3 x4
=
1 0 3 0
+ x2
−3 1 0 0
+ x4
−1 0
−2 1
.
The solution [1 0 3 0]Tis called a special solution, which corresponds to the solution with x2= x4= 0. The variables x2and x4are free parameters.
Examples
(b) This is an example for backward substitution and getting solutions from the reduced echelon form. First we perform a row scaling to normalize each pivot entry to be 1:
2 4 6 8 −6 4
3
9 12 3
5 −20 5
0
3 6
1 2 3 4 −3 2
1
3 4 1
1 −4 1
0
1 2
Next we perform row operation to eliminate all entries above the pivot entry to be zeros.
1 2 3 4 0 8
1
3 0 −7
1 0 9
0
1 2
1 2 3 0 0 −28
1
0 0 −34
1 0 1
0
1 2
1 2 0 0 0 74
1
0 0 −34
1 0 1
0
1 2
A linear system in such reduced echelon form can be solved easily. In this example, the solution is
x1+ 2x2= 74, x3= −34, x4= 1, x5= 2.
Here, x2is a free variable. In vector form, the solution reads
x1 x2 x3 x4 x5
=
74
0
−34 1 2
+ x2
−2 1 0 0 0
.
Examples
(c) This is an example for forward elimination. Consider the system
x1+ x2= b1
−x1 = b2 2x1+ x2= b3 2x1+ 3x2= b4
The Gaussian elimination for the augmented matrix is shown below:
1 1 b1
−1 0 b2
2 1 b3
2 3 b4
1 1 b1
0 1 b1+ b2 0 −1 −2b1+ b3 0 1 −2b1+ b4
1 1 b1
0 1 b1+ b2 0 0 −b1+ b2+ b3 0 0 −3b1− b2+ b4
This gives constraints on b to guarantee existence of solution:
0 = −b1+ b2+ b3 0 = −3b1− b2+ b4.
The solution is given by
x1= b2
Echelon form Reduced echelon form
Forward elimination
1 The forward elimination is performed from row 1 to row m.
2 Let us start from row 1. First, we search for the largest entry in magnitude in the first column {ak1|k = 1, ..., m}, say ap1. That is,
|ap1| = max{|ak1| |k = 1, ..., m}.
We are only interested to find the index p. Let us introduce the following notation for this index p:
p:=argmax{|ak1| |k = 1, ..., m}.
Then we swap the 1st equation and the pth equation. This swapping does not effect the solution at all. Let us still call the resulting matrix (aij).
3 If a116= 0, then we perform the shearing row operation to eliminate all {ak1} for k= 2, ..., m:
−a21
a11 (a11x1+ a12x2+ · · · + a1nxn= b1) + (a21x1+ a22x2+ · · · + a2nxn= b2) 0 + a022x2+ · · · + a02nxn= b02 where
a022= a22−a21
a11a12, · · · , a02n= a2n−a21
a11a1n, b02= b2−a21 a11b1. Let us denote this procedure
−a21
a11× 1 + 2 2’ . In terms of the augmented matrix, it looks like
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
.. .
..
. . .. ... .. . am1 am2 · · · amn bn
a11 a12 · · · a1n b1
0 a022 · · · a02n b02 ..
. ..
. . .. ... .. . am1 am2 · · · amn bn
We can repeat the above procedure for the third row, ..., till the mth row:
−a31
a11 × 1 + 3 3’ , · · · , −am1
a11 × 1 + m m’
Eventually, we arrive at
a11 a12 · · · a1n b1 0 a022 · · · a02n b02 0 a032 · · · a03n b03 ..
. ..
. . .. ... .. . 0 a0m2 · · · a0mn b0n
4 If a11= 0, it means that all ai1= 0 for all i = 1, ..., m. The matrix looks like
0 a12 · · · a1n b1 0 a22 · · · a2n b2 ..
. ..
. . .. ... .. . 0 am2 · · · amn bn
.
In this case, we go to the next entry of this row, that is a12. We repeat the above procedure to eliminate all entries below a12, and so on. This finishes the procedure for the first row.
5 We continue the above elimination process for row 2, row 3, and so on, until no more entry to be eliminated. The resulting matrix looks like:
×
× × × × ×
×
× × ×
×
× ×
0
× ××
Such a matrix is called in echelon form (staircase). Suppose there are r nonzero row vectors. We will see later that this is exactly the dimension of the subspace
Span(A1, ..., Am). We call r the row rank of A.
6 For each nonzero row, there is a nonzero leading entry (circled in the above figure). This leading entry is called a pivot of that row. Let us denote the pivot index of the ith row by jp(i). It has the following properties:
(i) jp(i + 1) > jp(i);
(ii) all entries below jp(i)are zeros;
(iii) rows with all zeros are at the bottom of the matrix.
The variable xjp(i)is called apivot variable, otherwise, afree variable.
× × × × × ×
×
× × ×
×
× ×
0
× ××
,
Backward substitution
1 We perform backward substitution on the above echelon matrix from row r to row 1. The substitution is to use the pivot coefficient ai,jp(i)to eliminate all entries above it (i.e. ak,jp(i), k= i − 1, ..., 1.)
×
× × × × ×
×
× × ×
×
× ×
0
× ××
×
× × × 0 ×
×
× 0 ×
×
0 ×
0
× ××
×
× 0 0 0 ×
×
0 0 ×
×
0 ×
0
× ××
2 For each nonzero row i, i = r, ..., 1, we divide it by ai,jp(i)so that all pivot coefficients ai,jp(i)= 1. The resulting matrix has the form
1 × 0 0 0 ×
1 0 0 ×
1 0 ×
0
1 ××
Such matrix is called in reduced echelon form. Let us denote it by
h
C d
i
=
− CT1 − d1 ..
.
.. .
− CTr − dr
0 d0
m×(n+1)
Thus, the system Ax = b is changed to an equivalent system:
Gaussian elimination as an LU decomposition
1 A matrix L is called lower triangular matrix if
`ij= 0 for all i < j.
2 A matrix U is called upper triangular matrix if uij= 0 for all i > j.
3 A shearing row operation corresponds to a transformation: A ˜LA, where ˜Lis a lower triangular matrix.
1 0 0 · · · 0
`˜21 1 0 · · · 0 0 0 1 · · · 0
.. .
... 0
0 0 0 · · · 1
a11 a12 a13 · · · a1n a21 a22 a23 · · · a2n a31 a32 a33 · · · a3n
.. .
... .. . am1 am2 am3 · · · amn
=
a11 a12 a13 · · · a1n
a21+ ˜`21a11 a22+ ˜`21a12 a23+ ˜`21a12 · · · a2n+ ˜`21a1n
a31 a32 a33 · · · a3n
.. .
.. .
.. .
... .. .
am1 am2 am3 · · · amn
In terms of row vectors, it is
1 0 0 · · · 0
`˜21 1 0 · · · 0 ..
. ..
. 0
0 0 0 · · · 1
− AT1 −
− AT2 − .. .
− ATm −
=
− AT1 −
− AT2+ ˜`21AT1 − .. .
− ATm −
4 If we ignore the swapping, then the forward step of the Gaussian elimination is to transform A to an upper triangular matrix U by a lower triangular matrix ˜L:
1 0 0 · · · 0
˜`21 1 0 · · · 0
˜`31 `˜32 1 · · · 0 ..
.
.. . 0
`˜m1 `˜m2 `˜m3 · · · 1
a11 a12 a13 · · · a1n a21 a22 a23 · · · a2n a31 a32 a33 · · · a3n
.. .
.. .
.. . am1 am2 am3 · · · amn
=
u11 u12 u13 · · · u1n 0 u22 u23 · · · u2n 0 0 u33 · · · u3n
.. .
.. .
0 0 0 · · · umn
This can be rewritten as
a11 a12 a13 · · · a1n a21 a22 a23 · · · a2n a31 a32 a33 · · · a3n
. . .
=
1 0 0 · · · 0
`21 1 0 · · · 0
`31 `32 1 · · · 0
. .
u11 u12 u13 · · · u1n 0 u22 u23 · · · u2n 0 0 u33 · · · u3n
. .
where
1 0 0 · · · 0
`21 1 0 · · · 0
`31 `32 1 · · · 0 ..
.
..
. 0
`m1 `m2 `m3 · · · 1
m×m
1 0 0 · · · 0
`˜21 1 0 · · · 0
`˜31 `˜32 1 · · · 0 ..
.
.. . 0
˜`m1 `˜m2 ˜`m3 · · · 1
m×m
=
1 0 0 · · · 0 0 1 0 · · · 0 0 0 1 · · · 0
.. .
..
. 0
0 0 0 · · · 1
m×m
The decomposition
A= LU (2)
is called the LU decomposition of a matrix. We can obtain L from ˜Lby a recursion formula.
5 If we include swapping, then there exists a permutation matrix P such that
PA= LU.
Solving a linear system in a reduced echelon form
1 Recall the augmented matrix in echelon form
h
C d
i
=
− CT1 − d1
.. .
.. .
− CTr − dr
0 d0
m×(n+1)
(3)
2 The column indices {1, ..., n} are classified into pivot indices P = {jp(1), ..., jp(r)}and free indices F = {1, ..., n} \ P. Let us rearrange the order of {x1, ..., xn} such that
xP=
xjp(1) xjp(2) .. .
∈ Rr, xF =
xj1
xj2
.. .
, jk∈ F , j1< · · · < jn−r.
In this order, all pivot entries are put to the front and free-variable columns are moved to the rear. The reduced echelon form looks like
− CT1 − d1
− CT2 − d2 ..
. ...
− CTr − dr
=
1 0 · · · 0 c1,j1 · · · c1,jn−r d1 0 1 · · · 0 c2,j1 · · · c2,jn−r d2
..
. ... . .. ... ... ... 0 0 · · · 1 cr,j1 · · · cr,jn−r dr
The equations read
xjp(i)+X
j∈F
ci,jxj= di, i= 1, ..., r.
Thus,
xjp(i)= di−X
j∈F
ci,jxj, i= 1, ..., r.
The solution has the explicit form
"
xP
xF
#
=
"
dP
0
#
+X
j∈F
xj
"
−cj δj
#
Here,
dP=
d1
.. . dr
, cj=
c1,j
.. . cr,j
, δj=
δj1,j
.. . δjn−r,j
, j∈ F . (4)
The notation δi,jis called the Kronecker delta function. It is defined as
δi,j=
( 1 if i = j 0 if i 6= j . We rewrite it as
x = xp+X
j∈F
xjvj, xp:=
"
dP
0
# , vj:=
"
−cj δj
#
. (5)
The list {vj}j∈Fis independent. For, if there are coefficients {aj|j ∈ F } such that X
j∈F
ajvj= 0,
it implies
Xajδj= 0.
Example Consider
A=
1 1 2 −1 0 1
0 1 1 0 1 1
0 0 0 1 −1 1
0 0 0 0 1 0
This is a matrix in echelon form. The pivot and free indices are
P = {1, 2, 4, 5}, F = {3, 6}.
The reduced echelon matrix is
C=
1 0 1 0 0 1
0 1 1 0 0 1
0 0 0 1 0 1
0 0 0 0 1 0
This is the system
x1 + x3 + x6= 0 x2+ x3 + x6= 0 x4 + x6= 0 x5 = 0
This gives
x1= −x3− x6
x2= −x3− x6 x3= x3
x4= −x6
x5= 0 x6= x6.
Or
x1 x2 x3 x4 x5 x6
= x3
−1
−1 1 0 0 0
+ x6
−1
−1 0
−1 0 1
= x3v1+ x6v2.
You can check that
Avi= 0, Cvi= 0, I = 1, 2.
Geometric interpretation of the Gaussian elimination
1 The list of vectors {C1, ..., Cr} constitutes a basis for R(AT).
Proof. The row vector operations ( scaling, swapping, and shearing) transform
A=
− AT1 −
− AT2 − .. .
− ATm −
C :=
− CT1 − .. .
− CTr − 0
.
These row operations are closed in the row space R(AT) =Span(A1, ..., Am). And by Lemma 1, we get
Span(C1, ..., Cr) =Span(A1, ..., Am) = R(AT).
Lemma
Let A1, A2∈ V. Suppose A02= a2A2+ a1A1with a26= 0. Then Span(A1, A02) =Span(A1, A2).
The row vector of C has the form
− CT1 −
− CT2 − .. .
− CTr − 0 .. . 0
m×n
=
1 0 · · · 0 c1,j1 · · · c1,jn−r 0 1 · · · 0 c2,j1 · · · c2,jn−r
..
. ... . .. ... ...
0 0 · · · 1 cr,j1 · · · cr,jn−r 0 0 · · · 0 0 · · · 0
.. .
.. .
.. .
.. .
.. . 0 0 · · · 0 0 · · · 0
m×n
(6)
From this expression, it is easy to read that {C1, ..., Cr} is independent. We conclude that the Gaussian elimination provides an algorithm to construct a special basis {C1, ..., Cr} for the subspace Span(A1, ..., Am).
Row rank of A. The dimension r := dim R(AT)is independent of the Gaussian
elimination process. Any row operation process gives the same number r. This number is called the row rank of A.
2 The list of vectors {vj}j∈Fconstitutes a basis for N(A), where
vj:=
"
−cj δj
# , cj=
c1,j
.. . cr,j
, δj=
δj1,j
.. . δjn−r,j
, j∈ F
Proof. The kernel
N(A) = {x ∈ V|x ⊥Span(A1, ..., Am)} = {x ∈ V|x ⊥Span(C1, ..., Cr)}
We have seen the general solution for Ax = b has the expression (5)
x = xp+X
j∈F
xjvj.
When b = 0, xp= 0, we obtain
N(A) =Span{vj|j ∈ F }.
We have seen that {vj}j∈F is independent. Thus, {vj}j∈F is a basis for N(A).
3 The vectors Ci⊥ vjfor I ∈ P and j ∈ F .
We have seen this from above expression for v ∈ N(A). Alternatively, we can directly check this orthogonality. Suppose j = j1∈ F . Then
− CT1 −
− CT2 − .. .
− CTr − 0 .. . 0
−c1,j1
−c2,j1
.. .
−cr,j1
1 .. . 0
=
1 0 · · · 0 c1,j1 · · · c1,jn−r 0 1 · · · 0 c2,j1 · · · c2,jn−r
.. .
..
. . .. ...
.. .
0 0 · · · 1 cr,j1 · · · cr,jn−r
0 0 · · · 0 0 · · · 0 ..
. .. .
.. .
.. .
.. . 0 0 · · · 0 0 · · · 0
−c1,j1
−c2,j1
.. .
−cr,j1
1 .. . 0
= 0.
This shows
CTivj1= 0 for all i ∈ P.
Similar proof for CTivj= 0 for other j ∈ F . .
4 The set {Ci|i ∈ P} ∪ {vj|j ∈ F } constitutes a basis in V and
V= N(A) ⊕ R(AT).
Proof.
(a) We show N(A) ∩ R(AT) = {0}. Suppose v ∈ N(A) ∩ R(AT). From N(A) = R(AT)⊥, v ⊥ v. This implies v = 0.
(b) We show N(A) + R(AT) = V. From N(A) + R(AT) ⊂ V, and
dim V = |P| + |F | = dim R(AT) + dim N(A), by Proposition 1.1, we get
V= N(A) + R(AT).
Proposition
Let U ⊂ V be a subspace. If dim U = dim V, then U = V.
5 N(A)⊥= R(AT).
Proof.
(a) First we show R(AT) ⊂ N(A)⊥. Suppose v ∈ R(AT). This means that there is a w ∈ Wsuch that v = ATw. For any u ∈ N(A), we have
v · u = (ATw) · u = (ATw)Tu = wTAu = w · (Au) = 0.
Thus, v ∈ N(A)⊥.
(b) Next, we show N(A)⊥⊂ R(AT). Suppose v ∈ N(A)⊥⊂ V. From V= R(AT) ⊕ N(A), we can expand v as
v =X
i∈P
αiCi+X
j∈F
βjvj.
Since v ∈ N(A)⊥, we have
v · vk= 0 for all k ∈ F . This leads to βk= 0 for all k ∈ F . Thus,
v =X
αC ∈ R(AT).
Fundamental theorem of linear algebra
Theorem (Fundamental theorem of linear algebra)
Let A be an m × n matrix. Then the four fundamental subspaces R(A), R(AT), N(A) and N(AT) have the properties:
(1) The domain V has the orthogonal decomposition
V= R(AT) ⊕ N(A), R(AT) = N(A)⊥, N(A) = R(AT)⊥. (7)
(2) The range W has the orthogonal decomposition:
W= R(A) ⊕ N(AT), R(A) = N(AT)⊥, N(AT) = R(A)⊥. (8)
(3) Row rank of A = Column rank of A:
dim R(AT) = dim R(A). (9)
(4) The linear map x 7→ Ax, is 1-1 and onto from R(AT)to R(A).
Proof.
1 We have proven (1).
2 The proof of (2) is a duality argument. We simply replace A by ATand use (AT)T= Ato get the result.
3 We prove (3). First, we claim that {ACi}i∈Pconstitutes a basis for R(A). For any v ∈ V, v can be represented as
v =X
j∈F
ajvj+X
i∈P
biCi
We get
Av = A
X
i∈P
biCi
= X
i∈P
biACi.
This shows R(A) = Span({ACi}i∈P). Next, we show {ACi}i∈Pis independent. Suppose we have
X
i∈P
biACi= 0.
Then
A
X
i∈P
biCi
=0. ⇒ X
i∈P
biCi∈ N(A).
But Ci∈ N(A)⊥for i ∈ P, thus we get all bi= 0, i ∈ P. This shows that {ACi}i∈Pis a basis for R(A).
The consequence of this result is
dim R(A) = |P| = r.
Recall that {Ci}i∈Pis a basis for R(AT). Thus we obtain
dim R(A) = dim R(AT) = |P|.
4 The restricted linear map
A: R(AT) → R(AT) is 1-1 and onto (check by yourself.). For any v =P
i∈PbiCi∈ R(AT), its image by A is Av =X
i∈P
biACi∈ R(A).
Corollary
The following statements hold and are equivalent:
(a) For any subspace U ⊂ V, it holds
V= U ⊕ U⊥ (10)
(b) For any subspace U ⊂ V, it holds
(U⊥)⊥= U. (11)
(c) If U ( V, then there exists a nonzero subspace Z ⊂ V such that U = Z⊥.
Proof.
(a) We show (a) by the fundamental theorem of linear algebra. Let us choose a basis {A1, ..., Ar} in U, and define a r × n matrix:
A=
− AT1 − .. .
− ATr −
with {ATi}ri=1being its row vectors. Then U = R(AT). From the fundamental theorem of linear algebra, we have
V= R(AT) ⊕ N(A), R(AT) = U, N(A) = R(AT)⊥= U⊥. Thus, we get
V= U ⊕ U⊥.
(a) ⇒ (b). First, (10) implies
dim U = dim V − dim U⊥. Next, we apply (10) again with U replaced by U⊥to get
U⊥⊕ (U⊥)⊥= V.
This implies
dim(U⊥)⊥= dim V − dim U⊥. The above two gives
dim U = dim(U⊥)⊥.
On the other hand, we recall U ⊂ (U⊥)⊥. This together with dim U = dim(U⊥)⊥imply U= (U⊥)⊥.
(b) ⇒ (c). We choose Z = U⊥. Then Z 6= {0}. Otherwise, U = V. From (U⊥)⊥= U, we get Z⊥= U.
(c) ⇒ (a). Suppose U + U⊥( V. Then we can find a nonzero subspace Z such that Z⊥= U + U⊥. Then, for any u ∈ Z, we have
u ⊥ Uand u ⊥ U⊥.
This implies u ⊥ u. Thus, u = 0. This contradicts to Z 6= {0}. Hence, U + U⊥= V.
Summary
Gaussian elimination perform row operations to transform [A|b] to an equivalent but simpler system (a reduced echelon form).
The Gaussian elimination process is divided into two parts:
I forward elimination
I backward substitution
There are three kinds of row operations:
(1) scaling: Ai αAi, α 6= 0, (2) swapping: Ai↔ Aj
(3) shearing: A0i= Ai− αAj, α 6= 0.
The reduced equations read
xjp(i)+X
j∈F
ci,jxj= di, i= 1, ..., r.
which give solutions of the form
"
x # "
d # "
−c#
Theorem (Fundamental theorem of linear algebra)
Let A be an m × n matrix. Then the four fundamental subspaces R(A), R(AT), N(A) and N(AT) have the properties:
(1) The domain V has the orthogonal decomposition
V= R(AT) ⊕ N(A), R(AT) = N(A)⊥, N(A) = R(AT)⊥. (12)
(2) The range W has the orthogonal decomposition:
W= R(A) ⊕ N(AT), R(A) = N(AT)⊥, N(AT) = R(A)⊥. (13)
(3) Row rank of A = Column rank of A:
dim R(AT) = dim R(A). (14)
(4) The linear map x 7→ Ax, is 1-1 and onto from R(AT)to R(A).