Classical Gram-Schmidt and Modified Gram-Schmidt, a Comparison
Su Hyeong Lee Imperial College London
Classical Gram-Schmidt and Modified Gram-Schmidt, a Comparison
Question (a):
During lectures, we proved that ˆR, or R, was such that rlk= hak, qli for k = 2 −→ n, and
l = 1 −→ k − 1. Diagonal entries were shown to be rjj = ||vj||. We can amend CGS and
MGS as follows to include R:
Classical Gram-Schmidt (CGS)
% (Initialize R = zero matrix)
v1 = a1 q 1 = v1/||v1|| r11 = ||v1|| for j = 2 to n vj = aj−Pj−1 i=1haj, qiiqi qj = vj\||vj|| rjj = ||vj|| end for k = 1 to n − 1 for k1 = k + 1 to n rk,k1 = hqk, ak1i end end Modified Gram-Schmidt (MGS)
% (Initialize R = zero matrix)
for j = 1 to n v1 j = aj end for i = 1 to n − 1 qi = v(i)i /||v(i)i || for j = i + 1 to n vi+1j = vi j − hvij, qiiqi end ri,i = ||vii|| end rn,n = ||vnn|| qn = v(n)n /||v(n)n || for j2 = 1 to n − 1 for k = j2 + 1 to n rj2,k = hqj2, aki end end
It is stated in Page 1 of the coursework that we may use qi ≡ qi, for i = 1 −→ n. Given this information, it is clear intuitively (but not formally) what the MGS is up to; it takes each vector aj, and subtracts from them all vector components of qk, from k = 1 to j − 1, one step at a time. Thus a new qj is created to be orthonormal to {qk}j−1k=1.
Because q
i ≡ qi, I will denote qi as qi from now on.
To prove this, I claim that ∀α, β ∈ N, MGS generates v(β)
α = v(1)α −
Pβ−1
j=1hv(j)α , qjiqj, where P0
j=1 is defined to be the zero function. We will use induction on α + β.
We check the base case. α + β = 2 ⇒ α = 1, β = 1, and the formula yields v(1)1 = v(1)1 . Thus the base case is verified. Assume the induction hypothesis holds for α + β = k, for
k ∈ N≥2. We now prove that the induction hypothesis holds for α + β = k + 1. By definition, we have equation (A0):
v(β)α = v(β−1)α − hv(β−1)
α , qβ−1iqβ−1.
Note that α + β = k + 1 ⇒ α + (β − 1) = k. Thus, the induction hypothesis applies to
v(β−1)
α = v(1)α −
Pβ−2
j=1hv(j)α , qjiqj. Substituting this into (A0), we get equation (A1):
v(β)α = v(1)α − β−2 X j=1 hv(j) α , qjiqj − hv (β−1) α , qβ−1iqβ−1 = v (1) α − β−1 X j=1 hv(j) α , qjiqj.
Thus the induction is complete. Now, we use (A1), v(β)
α = v(1)α −
Pβ−1
j=1hv(j)α , qjiqj, to prove equation (A2):
v(β)α = v(1)α − β−1 X j=1 hv(1)α , qjiqj. We have v(j) α = v(1)α − Pj−1
k=1hv(k)α , qkiqk. Substituting this into (A1), we get (A3):
v(β)α = v(1)α − β−1 X j=1 hv(1) α − j−1 X k=1 hv(k) α , qkiqk, qjiqj This is where we use that {qi}n
i=1 is orthonormal. Applying the inner product to elements of the set {q
i} j−1
i=1 with qj will produce a zero, i.e., hqi, qji = 0 for
i = 1, . . . , j − 1. We use the linearity of the inner product to finally deduce:
hv(1) α − j−1 X k=1 hv(k) α , qkiqk, qji = hv(1)α , qji. Substituting this result into (A3) yields:
v(β)α = v(1)α − β−1 X j=1 hv(1) α , qjiqj. Again, note that v(1)
α = aα. Hence for β = α, v(α)α = aα−
Pα−1
j=1haα, qjiqj. This is exactly the algorithm for CGS. Hence, we arrive to the conclusion that v(k)k = vk. Furthermore,
hv(β)
α , qβi = hv(1)α −
Pβ−1
j=1hv(1)α , qjiqj, qβi = haα, qβi, using similar reasoning as above, i.e., the orthogonality of the {q
i} and v (1) α = aα.
Question (b): (Note: figure pushed to page 11)
For k = 10−1, the results are pretty much the same. The MGS seems to be a little bit more exact. For k = 10−5, the same observation applies; the MGS still holds the upper hand. CGS seems to be getting less and less exact much faster than MGS. For
k = 10−10, the accuracy of CGS seems to have decreased drastically. Please note
that the output of the CGS, MGS code below has been concatenated into a
single top-aligned document, and pushed to page 11 for more readability.
Please read the caption below Figure 1 as well.
1 function [Q, R] = cgs(A)
2 %Using matrices because Matlab screamed at me for dynamically ... allocating. 3 n = length(A(1,:)); 4 m = length (A(:,1)); 5 R = zeros(n); 6 V = zeros(m,n); 7 Q = zeros(m,n); 8 9 %Initial Step! 10 V(:,1) = A(:,1); 11 Q(:,1) = V(:,1) / norm(V(:,1)); 12 R(1,1) = norm(V(:,1)); 13
14 %All column vectors as in lectures, are concatenated into a matrix. 15 for j = 2:n 16 j0 = zeros(m,1); 17 18 %j0 is sigma(i = 1 to j−1)<a_j,q_i>q_i. 19 for i = 1:j−1 20 j0 = j0 + (dot(A(:,j),Q(:,i)) * Q(:,i));
21 end 22
23 %I'll be using the j0 here.
24 V(:,j) = A(:,j) − j0;
25 Q(:,j) = V(:,j) / norm(V(:,j));
26 R(j,j) = norm(V(:,j));
27 end 28
29 %This defines the non−diagonal entries of R. 30 for k = 1:n−1 31 for k1 = k+1 : n 32 R(k,k1) = dot(Q(:,k),A(:,k1)); 33 end 34 end 35 36 %final step 37 disp(Q) 38 disp(R) 39 disp(Q'*Q − eye(n)) 40 end 1 function [Q, R] = mgs(A)
2 %Using matrices because Matlab tried to hurt me for dynamically ... allocating. 3 n = length(A(1,:)); 4 m = length (A(:,1)); 5 R = zeros(n); 6 V = A; 7 Q = zeros(m,n); 8
9 %Initializing step skipped; has already been initialized by defining ... V = A
10
12 for i = 1:n−1
13 Q(:,i) = V(:,i) / norm(V(:,i));
14 for j1 = i+1:n
15 V(:,j1) = V(:,j1) − dot(V(:,j1),Q(:,i)) * Q(:,i);
16 end
17 %n−1 is maximum i value in this loop, so R(n,n) is not defined here.
18 R(i,i) = norm(V(:,i)); 19 end 20 21 R(n,n) = norm(V(:,n)); 22 Q(:,n) = V(:,n) / norm(V(:,n)); 23
24 %Defining the non−diagonal entries of R 25 for j2 = 1:n−1 26 for k = j2+1:n 27 R(j2,k) = dot(Q(:,j2),A(:,k)); 28 end 29 end 30 31 disp(Q) 32 disp(R) 33 disp(Q'*Q − eye(n)) 34 end Question (c): A = −2 −1 1 2 k 0 0 0 0 k 0 0 0 0 k 0
With the omnipresent matrix A hovering over us, let us first proceed to use CGS:
v1 = (−2, k, 0, 0)t⇒ ||v 1|| = 2 q (−2)2+ k2 '√22 = 2 ⇒ q 1 = (−1, k/2, 0, 0) t v2 = (−1, 0, k, 0)t− ((−1, 0, k, 0)t· (−1, k/2, 0, 0)t)(−1, k/2, 0, 0)t= (0, −k/2, k, 0)t Hence, we again divide by the norm of the vector:
q
(k/2)2+ k2 =q(5/4) ∗ k2 = k ∗ (√5/2), so q
2 = (0, −1/ √
5, 2/√5, 0)t.
Now things start getting complicated:
v3 = (1, 0, 0, k)t− ((1, 0, 0, k)t· (−1, k/2, 0, 0)t)(−1, k/2, 0, 0)t− ((1, 0, 0, k)t· (0, −1/√5, 2/√5, 0)t)(0, −1/√5, 2/√5, 0)t= · · · = (1, 0, 0, k)t+ (−1, k/2, 0, 0) = (0, k/2, 0, k) ⇒ q3 = (0, 1/√5, 0, 2/√5)
Writing this out will without a doubt get me negative points for presentation issues. Neatly abridging, we get:
v4 = · · · = (2, 0, 0, 0)t+ 2(−1, k/2, 0, 0)t = (0, k, 0, 0)t⇒ q
4 = (0, 1, 0, 0) t.
Let us now proceed to use MGS:
First we initialize:
v(1)1 = (−2, k, 0, 0)t
v(1)2 = (−1, 0, k, 0)t
v(1)3 = (1, 0, 0, k)t
v(1)4 = (2, 0, 0, 0)t
And now we go into a loop.
v(1)1 = (−2, k, 0, 0)t⇒ ||v(1) 1 || = 2 q (−2)2+ k2 '√22 = 2 ⇒ q 1 = (−1, k/2, 0, 0) t v(2)2 = (−1, 0, k, 0)t− ((−1, 0, k, 0)t· (−1, k/2, 0, 0)t)(−1, k/2, 0, 0)t = (0, −k/2, k, 0)t v(2)3 = (1, 0, 0, k)t− ((1, 0, 0, k)t· (−1, k/2, 0, 0)t)(−1, k/2, 0, 0)t= · · · = (1, 0, 0, k)t+ (−1, k/2, 0, 0)t= (0, k/2, 0, k)t v(2)4 = (2, 0, 0, 0)t− ((2, 0, 0, 0)t· (−1, k/2, 0, 0)t)(−1, k/2, 0, 0)t= (2, 0, 0, 0)t+ 2(−1, k/2, 0, 0)t= (0, k, 0, 0)t
Like above, we derive q
2, and use it to update v’s:
q (k/2)2+ k2 =q(5/4) ∗ k2 = k ∗ (√5/2), so q 2 = (0, −1/ √ 5, 2/√5, 0)t. v(3)3 = (0, k/2, 0, k)t− ((0, k/2, 0, k)t· (0, −1/√5, 2/√5, 0)t)(0, −1/√5, 2/√5, 0)t= · · · = (0, 2k/5, k/5, k)t v(3)4 = (0, k, 0, 0)t− ((0, k, 0, 0)t· (0, −1/√5, 2/√5, 0)t)(0, −1/√5, 2/√5, 0)t= (0, 4k/5, 2k/5, 0)t
Now we calculate: v(3)3 = (0, 2k/5, k/5, k)t⇒ q 3 = (0, 2/ √ 30, 1/√30, 5/√30)t v(4)4 = (0, 4k/5, 2k/5, 0)t− ((0, 4k/5, 2k/5, 0)t· (0, 2/√30, 1/√30, 5/√30)t)(0, 2/√30, 1/√30, 5/√30)t= · · · = (0, 2k/3, k/3, −k/3) ⇒ q4 = (0,q2/3,q1/6, −q1/6)t
And hence we are done. To sum up, the matix Q produced from CGS and MGS is as follows: QCGS = −1 0 0 0 k/2 −1/√5 1/√5 1 0 2/√5 0 0 0 0 2/√5 0 QM GS = −1 0 0 0 k/2 −1/√5 2/√30 q2/3 0 2/√5 1/√30 1/√6 0 0 5/√30 −1/√6
Comparing this to Figure 1 on page 11, case k = 10−10, it is almost an exact match! We can see that although on paper the two algorithms are equivalent, calculating like a computer (approximating as such) as k gets very small yields different Q’s for both algorithms. Hence the morale of this coursework! (Theoretically equivalent algorithms can behave differently in practice)
Question (d): This is my algorithm: 1 function [x,r] = QRfac(Q,R,A,b) 2 b1 = Q' * b; 3 num = length(b); 4 x = zeros(num,1);
5 x(num) = b1(num) / R(num,num); 6
7 for i = 1:num−1
8 sigma = 0;
9 %It helps greatly to look at x(num−i) = ... to see what sigma is ... used for
10 for j = 1 : i
11 sigma = sigma + R(num−i,num−(j−1)) * x(num−(j−1));
12 end
13 x(num−i) = ( b1(num−i) − sigma ) / R(num−i,num−i);
14 end 15
16 %Displaying the "residues" 17 r = b − A * x;
18 disp(norm(r)) 19 end
This is the result of my code. The function a is just a 3-line simple code that puts value
x of a(x) into the variable k in matrix A given in the coursework. I did it because my
hand hurt, and I didn’t want to type anymore. Plus, I edited a few lines out of the computer output to make things more readable, e.g. the vector values of r and x, so please don’t think that I am using an illegally downloaded version of Matlab. (I am not)
1 >> A = a(1/10); 2 >> [Q,R]=cgs(A); 3 >> [x,r] = QRfac(Q,R,A,b) 4 2.2523e−14 5 6 >> [Q,R]=mgs(A); 7 >> [x,r] = QRfac(Q,R,A,b) 8 2.0861e−14 9 10 >> A = a(10^−5); 11 >> [Q,R]=cgs(A); 12 >> [x,r] = QRfac(Q,R,A,b) 13 7.0551e−08
14 15 >> [Q,R]=mgs(A); 16 >> [x,r] = QRfac(Q,R,A,b) 17 6.3213e−08 18 19 >> A = a(10^−10); 20 >> [Q,R]=cgs(A); 21 >> [x,r] = QRfac(Q,R,A,b) 22 0.7483 23 24 >> [Q,R]=mgs(A); 25 >> [x,r] = QRfac(Q,R,A,b) 26 0.8300
To briefly comment on these results, the CGS and MGS were more evenly matched than in Question (b). They produced similar values of the norm of residue r. The length of the residue vector increased as k got smaller, and at k = 10−10 they both failed spectacularly. This result shows that even the MGS is unreliable for certain matrices (or situations).
Figure 1 . From top to bottom, this picture displays the results for CGS & MGS for the cases