Running head: SU HYEONG LEE (CID: ), M2AA3 PROJECT 1 1. Classical Gram-Schmidt and Modified Gram-Schmidt, a Comparison.

(1)

Classical Gram-Schmidt and Modified Gram-Schmidt, a Comparison

Su Hyeong Lee Imperial College London

(2)

Classical Gram-Schmidt and Modified Gram-Schmidt, a Comparison

Question (a):

During lectures, we proved that ˆR, or R, was such that rlk= hak, qli for k = 2 −→ n, and

l = 1 −→ k − 1. Diagonal entries were shown to be rjj = ||vj||. We can amend CGS and

MGS as follows to include R:

Classical Gram-Schmidt (CGS)

% (Initialize R = zero matrix)

v₁ = a₁ q 1 = v1/||v1|| r11 = ||v1|| for j = 2 to n v_j = a_j−Pj−1 i=1haj, q_iiq_i q_j = v_j\||vj|| rjj = ||vj|| end for k = 1 to n − 1 for k1 = k + 1 to n rk,k1 = hq_k, ak1i end end Modified Gram-Schmidt (MGS)

% (Initialize R = zero matrix)

for j = 1 to n v1 j = aj end for i = 1 to n − 1 q_i = v(i)_i /||v(i)_i || for j = i + 1 to n vi+1_j = vi j − hvij, q_iiq_i end ri,i = ||vii|| end rn,n = ||vnn|| q_n = v(n)_n /||v(n)_n || for j2 = 1 to n − 1 for k = j2 + 1 to n rj2,k = hq_j2, aki end end

It is stated in Page 1 of the coursework that we may use q_i ≡ q_i, for i = 1 −→ n. Given this information, it is clear intuitively (but not formally) what the MGS is up to; it takes each vector a_j, and subtracts from them all vector components of q_k, from k = 1 to j − 1, one step at a time. Thus a new q_j is created to be orthonormal to {q_k}j−1_k=1.

(3)

Because q

i ≡ qi, I will denote qi as qi from now on.

To prove this, I claim that ∀α, β ∈ N, MGS generates v(β)

α = v(1)α −

Pβ−1

j=1hv(j)α , qjiqj, where P0

j=1 is defined to be the zero function. We will use induction on α + β.

We check the base case. α + β = 2 ⇒ α = 1, β = 1, and the formula yields v(1)₁ = v(1)₁ . Thus the base case is verified. Assume the induction hypothesis holds for α + β = k, for

k ∈ N≥2. We now prove that the induction hypothesis holds for α + β = k + 1. By definition, we have equation (A0):

v(β)_α = v(β−1)_α − hv(β−1)

α , q_β−1iq_β−1.

Note that α + β = k + 1 ⇒ α + (β − 1) = k. Thus, the induction hypothesis applies to

v(β−1)

α = v(1)α −

Pβ−2

j=1hv(j)α , q_jiq_j. Substituting this into (A0), we get equation (A1):

v(β)_α = v(1)_α − β−2 X j=1 hv(j) α , q_jiq_j − hv (β−1) α , q_β−1iq_β−1 = v (1) α − β−1 X j=1 hv(j) α , q_jiq_j.

Thus the induction is complete. Now, we use (A1), v(β)

α = v(1)α −

Pβ−1

j=1hv(j)α , q_jiq_j, to prove equation (A2):

v(β)_α = v(1)_α − β−1 X j=1 hv(1)_α , q_jiq_j. We have v(j) α = v(1)α − Pj−1

k=1hv(k)α , q_kiq_k. Substituting this into (A1), we get (A3):

v(β)_α = v(1)_α − β−1 X j=1 hv(1) α − j−1 X k=1 hv(k) α , qkiqk, qjiqj This is where we use that {q_i}n

i=1 is orthonormal. Applying the inner product to elements of the set {q

i} j−1

i=1 with q_j will produce a zero, i.e., hq_i, q_ji = 0 for

i = 1, . . . , j − 1. We use the linearity of the inner product to finally deduce:

hv(1) α − j−1 X k=1 hv(k) α , q_kiq_k, q_ji = hv(1)α , q_ji. Substituting this result into (A3) yields:

v(β)_α = v(1)_α − β−1 X j=1 hv(1) α , qjiqj. Again, note that v(1)

α = aα. Hence for β = α, v(α)α = aα−

Pα−1

j=1haα, q_jiq_j. This is exactly the algorithm for CGS. Hence, we arrive to the conclusion that v(k)_k = v_k. Furthermore,

(4)

hv(β)

α , q_βi = hv(1)α −

Pβ−1

j=1hv(1)α , q_jiq_j, q_βi = haα, q_βi, using similar reasoning as above, i.e., the orthogonality of the {q

i} and v (1) α = aα.

Question (b): (Note: figure pushed to page 11)

For k = 10−1, the results are pretty much the same. The MGS seems to be a little bit more exact. For k = 10−5, the same observation applies; the MGS still holds the upper hand. CGS seems to be getting less and less exact much faster than MGS. For

k = 10−10, the accuracy of CGS seems to have decreased drastically. Please note

that the output of the CGS, MGS code below has been concatenated into a

single top-aligned document, and pushed to page 11 for more readability.

Please read the caption below Figure 1 as well.

1 function [Q, R] = cgs(A)

2 %Using matrices because Matlab screamed at me for dynamically ... allocating. 3 n = length(A(1,:)); 4 m = length (A(:,1)); 5 R = zeros(n); 6 V = zeros(m,n); 7 Q = zeros(m,n); 8 9 %Initial Step! 10 V(:,1) = A(:,1); 11 Q(:,1) = V(:,1) / norm(V(:,1)); 12 R(1,1) = norm(V(:,1)); 13

14 %All column vectors as in lectures, are concatenated into a matrix. 15 for j = 2:n 16 j0 = zeros(m,1); 17 18 %j0 is sigma(i = 1 to j−1)<a_j,q_i>q_i. 19 for i = 1:j−1 20 j0 = j0 + (dot(A(:,j),Q(:,i)) * Q(:,i));

(5)

21 end 22

23 %I'll be using the j0 here.

24 V(:,j) = A(:,j) − j0;

25 Q(:,j) = V(:,j) / norm(V(:,j));

26 R(j,j) = norm(V(:,j));

27 end 28

29 %This defines the non−diagonal entries of R. 30 for k = 1:n−1 31 for k1 = k+1 : n 32 R(k,k1) = dot(Q(:,k),A(:,k1)); 33 end 34 end 35 36 %final step 37 disp(Q) 38 disp(R) 39 disp(Q'*Q − eye(n)) 40 end 1 function [Q, R] = mgs(A)

2 %Using matrices because Matlab tried to hurt me for dynamically ... allocating. 3 n = length(A(1,:)); 4 m = length (A(:,1)); 5 R = zeros(n); 6 V = A; 7 Q = zeros(m,n); 8

9 %Initializing step skipped; has already been initialized by defining ... V = A

10

(6)

12 for i = 1:n−1

13 Q(:,i) = V(:,i) / norm(V(:,i));

14 for j1 = i+1:n

15 V(:,j1) = V(:,j1) − dot(V(:,j1),Q(:,i)) * Q(:,i);

16 end

17 %n−1 is maximum i value in this loop, so R(n,n) is not defined here.

18 R(i,i) = norm(V(:,i)); 19 end 20 21 R(n,n) = norm(V(:,n)); 22 Q(:,n) = V(:,n) / norm(V(:,n)); 23

24 %Defining the non−diagonal entries of R 25 for j2 = 1:n−1 26 for k = j2+1:n 27 R(j2,k) = dot(Q(:,j2),A(:,k)); 28 end 29 end 30 31 disp(Q) 32 disp(R) 33 disp(Q'*Q − eye(n)) 34 end Question (c): A =             −2 −1 1 2 k 0 0 0 0 k 0 0 0 0 k 0            

With the omnipresent matrix A hovering over us, let us first proceed to use CGS:

v₁ = (−2, k, 0, 0)t_{⇒ ||v} 1|| = 2 q (−2)2_{+ k}2 _'√₂2 _{= 2 ⇒ q} 1 = (−1, k/2, 0, 0) t v₂ = (−1, 0, k, 0)t_{− ((−1, 0, k, 0)}t_{· (−1, k/2, 0, 0)}t_{)(−1, k/2, 0, 0)}t_{= (0, −k/2, k, 0)}t Hence, we again divide by the norm of the vector:

(7)

q

(k/2)2_{+ k}2 ₌q_{(5/4) ∗ k}2 _{= k ∗ (}√_{5/2), so q}

2 = (0, −1/ √

5, 2/√5, 0)t_.

Now things start getting complicated:

v₃ = (1, 0, 0, k)t_{− ((1, 0, 0, k)}t_{· (−1, k/2, 0, 0)}t_{)(−1, k/2, 0, 0)}t_{− ((1, 0, 0, k)}t_· (0, −1/√5, 2/√5, 0)t_{)(0, −1/}√_{5, 2/}√_{5, 0)}t_{= · · · = (1, 0, 0, k)}t_{+ (−1, k/2, 0, 0) =} (0, k/2, 0, k) ⇒ q₃ = (0, 1/√5, 0, 2/√5)

Writing this out will without a doubt get me negative points for presentation issues. Neatly abridging, we get:

v₄ = · · · = (2, 0, 0, 0)t_{+ 2(−1, k/2, 0, 0)}t _{= (0, k, 0, 0)}t_{⇒ q}

4 = (0, 1, 0, 0) t_.

Let us now proceed to use MGS:

First we initialize:

v(1)₁ = (−2, k, 0, 0)t

v(1)₂ = (−1, 0, k, 0)t

v(1)₃ = (1, 0, 0, k)t

v(1)₄ = (2, 0, 0, 0)t

And now we go into a loop.

v(1)₁ = (−2, k, 0, 0)t_{⇒ ||v}(1) 1 || = 2 q (−2)2_{+ k}2 _'√₂2 _{= 2 ⇒ q} 1 = (−1, k/2, 0, 0) t v(2)₂ = (−1, 0, k, 0)t_{− ((−1, 0, k, 0)}t_{· (−1, k/2, 0, 0)}t_{)(−1, k/2, 0, 0)}t _{= (0, −k/2, k, 0)}t v(2)₃ = (1, 0, 0, k)t_{− ((1, 0, 0, k)}t_{· (−1, k/2, 0, 0)}t_{)(−1, k/2, 0, 0)}t_{= · · · =} (1, 0, 0, k)t+ (−1, k/2, 0, 0)t= (0, k/2, 0, k)t v(2)₄ = (2, 0, 0, 0)t− ((2, 0, 0, 0)t_{· (−1, k/2, 0, 0)}t_{)(−1, k/2, 0, 0)}t₌ (2, 0, 0, 0)t_{+ 2(−1, k/2, 0, 0)}t_{= (0, k, 0, 0)}t

Like above, we derive q

2, and use it to update v’s:

q (k/2)2_{+ k}2 ₌q_{(5/4) ∗ k}2 _{= k ∗ (}√_{5/2), so q} 2 = (0, −1/ √ 5, 2/√5, 0)t_. v(3)₃ = (0, k/2, 0, k)t_{− ((0, k/2, 0, k)}t_{· (0, −1/}√_{5, 2/}√_{5, 0)}t_{)(0, −1/}√_{5, 2/}√_{5, 0)}t_{= · · · =} (0, 2k/5, k/5, k)t v(3)₄ = (0, k, 0, 0)t− ((0, k, 0, 0)t_{· (0, −1/}√_{5, 2/}√_{5, 0)}t_{)(0, −1/}√_{5, 2/}√_{5, 0)}t₌ (0, 4k/5, 2k/5, 0)t

(8)

Now we calculate: v(3)₃ = (0, 2k/5, k/5, k)t_{⇒ q} 3 = (0, 2/ √ 30, 1/√30, 5/√30)t v(4)₄ = (0, 4k/5, 2k/5, 0)t_{− ((0, 4k/5, 2k/5, 0)}t_· (0, 2/√30, 1/√30, 5/√30)t)(0, 2/√30, 1/√30, 5/√30)t= · · · = (0, 2k/3, k/3, −k/3) ⇒ q₄ = (0,q2/3,q1/6, −q1/6)t

And hence we are done. To sum up, the matix Q produced from CGS and MGS is as follows: QCGS =             −1 0 0 0 k/2 −1/√5 1/√5 1 0 2/√5 0 0 0 0 2/√5 0             QM GS =             −1 0 0 0 k/2 −1/√5 2/√30 q2/3 0 2/√5 1/√30 1/√6 0 0 5/√30 −1/√6            

Comparing this to Figure 1 on page 11, case k = 10−10, it is almost an exact match! We can see that although on paper the two algorithms are equivalent, calculating like a computer (approximating as such) as k gets very small yields different Q’s for both algorithms. Hence the morale of this coursework! (Theoretically equivalent algorithms can behave differently in practice)

Question (d): This is my algorithm: 1 function [x,r] = QRfac(Q,R,A,b) 2 b1 = Q' * b; 3 num = length(b); 4 x = zeros(num,1);

5 x(num) = b1(num) / R(num,num); 6

(9)

7 for i = 1:num−1

8 sigma = 0;

9 %It helps greatly to look at x(num−i) = ... to see what sigma is ... used for

10 for j = 1 : i

11 sigma = sigma + R(num−i,num−(j−1)) * x(num−(j−1));

12 end

13 x(num−i) = ( b1(num−i) − sigma ) / R(num−i,num−i);

14 end 15

16 %Displaying the "residues" 17 r = b − A * x;

18 disp(norm(r)) 19 end

This is the result of my code. The function a is just a 3-line simple code that puts value

x of a(x) into the variable k in matrix A given in the coursework. I did it because my

hand hurt, and I didn’t want to type anymore. Plus, I edited a few lines out of the computer output to make things more readable, e.g. the vector values of r and x, so please don’t think that I am using an illegally downloaded version of Matlab. (I am not)

1 >> A = a(1/10); 2 >> [Q,R]=cgs(A); 3 >> [x,r] = QRfac(Q,R,A,b) 4 2.2523e−14 5 6 >> [Q,R]=mgs(A); 7 >> [x,r] = QRfac(Q,R,A,b) 8 2.0861e−14 9 10 >> A = a(10^−5); 11 >> [Q,R]=cgs(A); 12 >> [x,r] = QRfac(Q,R,A,b) 13 7.0551e−08

(10)

14 15 >> [Q,R]=mgs(A); 16 >> [x,r] = QRfac(Q,R,A,b) 17 6.3213e−08 18 19 >> A = a(10^−10); 20 >> [Q,R]=cgs(A); 21 >> [x,r] = QRfac(Q,R,A,b) 22 0.7483 23 24 >> [Q,R]=mgs(A); 25 >> [x,r] = QRfac(Q,R,A,b) 26 0.8300

To briefly comment on these results, the CGS and MGS were more evenly matched than in Question (b). They produced similar values of the norm of residue r. The length of the residue vector increased as k got smaller, and at k = 10−10 they both failed spectacularly. This result shows that even the MGS is unreliable for certain matrices (or situations).

(11)

Figure 1 . From top to bottom, this picture displays the results for CGS & MGS for the cases