Lecture 1: Schur s Unitary Triangularization Theorem

(1)

Lecture 1: Schur’s Unitary Triangularization Theorem

This lecture introduces the notion of unitary equivalence and presents Schur’s theorem and some of its consequences. It roughly corresponds to Sections 2.1, 2.2, and 2.4 of the textbook.

1 Unitary matrices

Definition 1. A matrix U ∈ M_nis called unitary if U U^∗ = I (= U^∗U ).

If U is a real matrix (in which case U^∗ is just U^>), then U is called an orthogonal matrix.

Observation: If U, V ∈ M_nare unitary, then so are ¯U , U^>, U^∗(= U⁻¹), U V . Observation: If U is a unitary matrix, then

| det U | = 1.

Examples: Matrices of reflection and of rotations are unitary (in fact, orthogonal) matrices.

For instance, in 3D-space,

reflection along the z-axis: U =







1 0 0

0 1 0

0 0 −1





, det U = −1,

rotation along the z-axis: U =







cos θ − sin θ 0 sin θ cos θ 0

0 0 1





, det U = 1.

That these matrices are unitary is best seen using one of the alternate characterizations listed below.

Theorem 2. Given U ∈ M_n, the following statements are equivalent:

(i) U is unitary,

(ii) U preserves the Hermitian norm, i.e.,

kU xk = kxk for all x ∈ Cⁿ. (iii) the columns U₁, . . . , U_nof U form an orthonormal system, i.e.,

hU_i, U_ji = δ_i,j, in other words U_j^∗U_i = (

1 if i = j, 0 if i 6= j.

(2)

Proof. (i)⇒ (ii). If U^∗U = I, then, for any x ∈ Cⁿ,

kU xk²₂ = hU x, U xi = hU^∗U x, xi = hx, xi = kxk²₂.

(ii)⇒ (iii). Let (e₁, e2, . . . , en) denote the canonical basis of Cⁿ. Assume that U preserves the Hermitian norm of every vector. We obtain, for j ∈ {1, . . . , n},

hU_j, U_ji = kU_jk²₂ = kU e_jk²₂ = ke_jk²₂= 1.

Moreover, for i, j ∈ {1, . . . , n} with i 6= j, we have, for any complex number λ of modulus 1,

<hλU_i, Uji = 1

2 kλU_i+ Ujk²₂− kU_ik²₂− kU_jk²₂ = 1

2 kU (λe_i+ ej)k²₂− kU (e_i)k²₂− kU (e_j)k²₂

= 1

2 kλe_i+ e_jk²₂− ke_ik²₂− ke_jk²₂ = 0.

This does imply thathU_i, Uji = 0 (argue for instance that |hλU_i, Uji| = <hλU_i, Uji for a properly chosen λ ∈ C with |λ| = 1).

(iii)⇒ (i). Observe that the (i, j)th entry of U^∗U is U_i^∗Uj to realize that (iii) directly translates into U^∗U = I.

According to (iii), a unitary matrix can be interpreted as the matrix of an orthonormal basis in another orthonormal basis. In terms of linear maps represented by matrices A, the change of orthonormal bases therefore corresponds to the transformation A 7→ U AU^∗ for some unitary matrix U . This transformation defines the unitary equivalence.

Definition 3. Two matrices A, B ∈ M_nare called unitary equivalent if there exists a unitary matrix U ∈ M_nsuch that

B = U AU^∗.

Observation: If A, B ∈ M_nare unitary equivalent, then X

1≤i,j≤n

|a_i,j|²= X

1≤i,j≤n

|b_i,j|².

Indeed, recall form Lect.1-Ex.3 thatP

1≤i,j≤n|a_i,j|² = tr (A^∗A) andP

1≤i,j≤n|b_i,j|² = tr (B^∗B), and then write

tr B^∗B

= tr (U AU^∗)^∗(U AU^∗) = tr U A^∗U^∗U AU^∗ = tr U A^∗AU^∗ = tr U^∗U A^∗A

= tr A^∗A.

(3)

2 Statement of Schur’s theorem and some of its consequences

Schur’s unitary triangularization theorem says that every matrix is unitarily equivalent to a triangular matrix. Precisely, it reads as follows.

Theorem 4. Given A ∈ M_nwith eigenvalues λ₁, . . . , λ_n, counting multiplicities, there exists a unitary matrix U ∈ M_nsuch that

A = U







λ1 x · · · x 0 λ₂ . .. ...

... . .. ... x 0 · · · 0 λn





 U^∗.

Note that such a decomposition is far from unique (see Example 2.3.2 p.80-81). Let us now state a few consequences from Schur’s theorem. First, Cayley–Hamilton theorem says that every square matrix annihilates its own characteristic polynomial.

Theorem 5. Given A ∈ M_n, one has

p_A(A) = 0.

The second consequence of Schur’s theorem says that every matrix is similar to a block- diagonal matrix where each block is upper triangular and has a constant diagonal. This is an important step in a possible proof of Jordan canonical form.

Theorem 6. Given A ∈ M_nwith distinct eigenvalues λ₁, . . . , λ_k, there is an invertible matrix S ∈ Mnsuch that

A = S







T₁ 0 · · · 0 0 T₂ . .. ...

... . .. ... 0 0 · · · 0 T_k







S⁻¹, where T_i has the form T_i =







λ_i x · · · x 0 λ_i . .. ...

... . .. ... x 0 · · · 0 λ_i





 .

The arguments for Theorems 5 and 6 (see next section) do not use the unitary equivalence stated in Schur’s theorem, but merely the equivalence. The unitary equivalence is nonetheless crucial for the final consequence of Schur’s theorem, which says that there are diagonalizable matrices arbitrary close to any matrix (in other words, the set of diagonalizable matrices is dense inM_n).

Theorem 7. Given A ∈ M_nand ε > 0, there exists a diagonalizable matrix eA ∈ M_nsuch that X

1≤i,j≤n

|a_i,j−ea_i,j|² < ε.

(4)

3 Proofs

Proof of Theorem 4. We proceed by induction on n ≥ 1. For n = 1, there is nothing to do.

Suppose now the result true up to an integer n − 1, n ≥ 2. Let A ∈ M_n with eigenvalues λ₁, . . . , λ_n, counting multiplicities. Consider an eigenvector v₁ associated to the eigenvalue λ₁. We may assume thatkv₁k₂ = 1. We use it to form an orthonormal basis (v₁, v₂, . . . , v_n).

The matrix A is equivalent to the matrix of the linear map x 7→ Ax relative to the basis (v₁, v₂, . . . , v_n), i.e.,

(1) A = V







λ₁ x · · · x 0

... Ae 0





 V⁻¹,

where V =h

v₁· · · v_ni

is the matrix of the system (v₁, v₂, . . . , v_n) relative to the canonical basis.

Since this is a unitary matrix, the equivalence of (1) is in fact a unitary equivalence. Note that p_A(x) = (λ1− x)p

Ae(x), so that the eigenvalues of eA ∈ Mn−1, counting multiplicities, are λ2, . . . , λn. We use the induction hypothesis to find a unitary matrix fW ∈ Mn−1such that

A = fe W







λ2 x · · · x 0 . .. ... ...

... . .. ... x 0 · · · 0 λ_n







fW^∗, i.e., fW^∗AfeW =







λ2 x · · · x 0 . .. ... ...

... . .. ... x 0 · · · 0 λ_n





 .

Now observe that







1 0 · · · 0 0

... fW 0







∗





λ₁ x · · · x 0

... Ae 0













1 0 · · · 0 0

... fW 0







=







1 0 · · · 0 0

... Wf^∗ 0













λ₁ x · · · x 0

... AfeW 0







=







λ₁ x · · · x 0

... Wf^∗AfeW 0







=







λ₁ x · · · x 0 λ₂ · · · x ... ... . .. ...

0 0 · · · λn





 .

Since W :=

"

1 0

0 fW

#

is a unitary matrix, this reads

(2)







λ1 x · · · x 0

... Ae 0







= W







λ1 x · · · x 0 λ₂ · · · x ... ... . .. ...

0 0 · · · λn





 W^∗.

Putting the unitary equivalences (1) and (2) together shows the result of Theorem 4 (with U = V W ) for the integer n. This concludes the inductive proof.

(5)

Now that Schur’s theorem is established, we may prove the consequences stated in Section 2.

Proof of Theorem 5. First attempt, valid when A is diagonalizable. In this case, there is a basis (v₁, . . . , v_n) of eigenvectors associated to (not necessarily distinct) eigenvalues λ₁, . . . , λ_n. It is enough to show that the matrix p_A(A) vanishes on each basis vector vi, 1 ≤ i ≤ n. Note that

pA(A) = (λ1I − A) · · · (λnI − A) = [(λ1I − A) · · · (λi−1I − A)(λi+1I − A) · · · (λnI − A)](λiI − A), because (λ_iI − A) commutes with all (λ_jI − A). Then the expected results follows from

pA(A)(vi) = [· · · ](λiI − A)(vi) = [· · · ](0) = 0.

Final proof. Let λ₁, . . . , λ_n be the eigenvalues of A ∈ M_n, counting multiplicities. According to Schur’s theorem, we can write

A = ST S⁻¹, where T =







λ1 x · · · x 0 λ₂ · · · x ... ... . .. ...

0 0 · · · λn





 .

Since P (A) = SP (T )S⁻¹ for any polynomial P , we have in particular p_A(A) = SpA(T )S⁻¹, so it is enough to show that p_A(T ) = 0. We compute

pA(T ) = (λ1I − T )(λ2I − T ) · · · (λnI − T )

=







0 x x · · · x 0 x x · · · x

0 0

... ... X

0 0













x x x · · · x 0 0 x · · · x

0 0

... ... X

0 0







· · ·







x x

X ... ...

x x

0 · · · 0 x x 0 · · · 0 0 0





 .

Using repeatedly the observation about multiplication of block-triangular matrices that







0 x x

0 0 x













x x x

0 0 x





=







0 0 x





,

the zero-block on the top left propagates until we obtain p_A(T ) = 0 — a rigorous proof of this propagation statement should proceed by induction.

To establish the next consequence of Schur’s theorem, we will use the following result.

Lemma 8. If A ∈ M_m and B ∈ M_nare two matrices with no eigenvalue in common, then the matrices

"

A M

0 B

#

and

"

A 0

0 B

#

are equivalent for any choice of M ∈ M_m×n.

(6)

Proof. For X ∈ M_m×n, consider the matrices S and S⁻¹given by S =

"

I X

0 I

#

and S⁻¹=

"

I −X

0 I

# . We compute

S⁻¹

"

A 0

0 B

# S =

"

A AX − XB

0 B

# .

The result will follow as soon as we can find X ∈ M_m×n such that AX − XB = M for our given M ∈ M_m×n. If we denote byF the linear map

F : X ∈ M_m×n7→ AX − XB ∈ M_m×n,

it is enough to show thatF is surjective. But since F is a linear map from M_m×ninto itself, it is therefore enough to show thatF is injective, i.e., that

(3) AX − XB = 0=⇒ X = 0.^?

To see why this is true, let us consider X ∈ M_m×nsuch that AX = XB, and observe that A²X = A(AX) = A(XB) = (AX)B = (XB)B = XB²,

A³X = A(A²X) = A(XB²) = (AX)B² = (XB)B² = XB³,

etc., so that P (A)X = XP (B) for any polynomial P . If we choose P = p_Aas the characteristic polynomial of A, Cayley–Hamilton theorem implies

(4) 0 = Xp_A(B).

Denoting by λ₁, . . . , λnthe eigenvalues of A, we have p_A(B) = (λ1I − B) · · · (λnI − B). Note that each factor (λ_iI − B) is invertible, since none of the λi is an eigenvalue of B, so that p_A(B) is itself invertible. We can now conclude from (4) that X = 0. This establishes (3), and finishes the proof.

We could have given a less conceptual proof of Lemma 8 in case both A and B are upper triangular (see exercises), which is actually what the proof presented below requires.

Proof of Theorem 6. For A ∈ M_n, we sort its eigenvalues as λ₁, . . . , λ₁;λ₂, . . . , λ₂;. . .;λ_k, . . . , λ_k, counting multiplicities. Schur’s theorem guarantees that A is similar to the matrix







T₁ X · · · X 0 T2 . .. ...

... . .. ... X 0 · · · 0 T_k







where T_i =







λ_i x · · · x 0 λi . .. ...

... . .. ... x 0 · · · 0 λ_i





 .

(7)

We now use the Lemma 8 repeatedly in the chain of equivalences

A ∼







T1 X · · · · · · X 0 T2 X · · · X ... 0 T₃ . .. ...

... ... . .. ... X 0 0 · · · 0 T_k







∼







T1 0 · · · · · · 0 0 T2 X · · · X ... 0 T₃ . .. ...

... ... . .. ... X 0 0 · · · 0 T_k







=





 T₁ 0

0 T2 X

T3 · · · X 0 ... . .. ...

0 · · · T_k







∼





 T1 0

0 T₂ 0

T₃ · · · X 0 ... . .. ...

0 · · · T_k







=







T1 0 0

0 T₂ 0 X

0 0 T3

0 . .. X 0 T_k







∼







T1 0 0

0 T₂ 0 0

0 0 T3

0 . .. 0 0 T_k







∼ · · ·

∼







T₁ 0 · · · 0 0 T2 . .. ...

... . .. ... 0 0 · · · 0 T_k





 This is the announced result.

Proof of Theorem 7. Let us sort the eigenvalues of A as λ₁ ≥ · · · ≥ λ_n. According to Schur’s theorem, there exists a unitary matrix U ∈ M_nsuch that

A = U







λ₁ x · · · x 0 λ2 . .. ...

... . .. ... x 0 · · · 0 λn





 U^∗.

If eλi := λi+ iη and η > 0 is small enough to guarantee that eλ1, . . . , eλnare all distinct, we set

A = Ue







eλ1 x · · · x 0 eλ₂ . .. ...

... . .. ... x 0 · · · 0 eλ_n





 U^∗.

In this case, the eigenvalues of eA (i.e., eλ₁, . . . , eλ_n) are all distinct, hence eA is diagonalizable.

We now notice that

X

1≤i,j≤n

|a_i,j−ea_i,j|² = tr

(A − eA)^∗(A − eA) .

But since A − eA is unitarily equivalent of the diagonal matrix diag[λ₁− eλ₁, . . . , λ_n− eλ_n], this quantity equalsP

1≤i≤n|λ_i− eλ_i|². It follows that X

1≤i,j≤n

|a_i,j−ea_i,j|²= X

1≤i≤n

i²η²< ε, provided η is chosen small enough to have η²< ε/P

ii².

(8)

4 Exercises

Ex.1: Is the sum of unitary matrices also unitary?

Ex.2: Exercise 2 p. 71.

Ex.3: When is a diagonal matrix unitary?

Ex.4: Exercise 12 p. 72.

Ex.5: Given A ∈ M_n with eigenvalues λ₁, . . . , λ_n, counting multiplicities, prove that there exists a unitary matrix U ∈ M_nsuch that

A = U







λ1 0 · · · 0 x λ₂ . .. ...

... . .. ... 0 x · · · x λn





 U^∗.

Ex.6: Prove that a matrix U ∈ M_n is unitary iff it preserves the Hermitian inner product, i.e., iffhU x, U yi = hx, yi for all x, y ∈ Cⁿ.

Ex.7: Given a matrix A ∈ M_nwith eigenvalues λ₁, . . . , λn, counting multiplicities, and given a polynomial P (x) = c_dx^d + · · · + c1x + c0, prove that the eigenvalues of P (A) are P (λ₁), . . . , P (λ_n), counting multiplicities.

(Note: this is not quite the same as Exercise 7 from Lecture 1.) Ex.8: Exercise 5 p. 97.

Ex.9: For any matrix A ∈ M_n, prove that

det(exp(A)) = exp(tr (A)).

(The exponential of a matrix is defined as the convergent series exp(A) =P

k≥0 1 k!A^k.) Ex.10: Given an invertible matrix A ∈ M_n, show that its inverse A⁻¹ can be expressed as a

polynomial of degree≤ n − 1 in A.

Ex.11: Without using Cayley-Hamilton theorem, prove that if T ∈ M_m and eT ∈ Mn are two upper triangular matrices with no eigenvalue in common, then the matrices

"

T M

0 Te

#

and

"

T 0

0 Te

#

are equivalent for any choice of M ∈ M_m×n.

[Hint: Observe that you need to show T X = X eT =⇒ X = 0. Start by considering the element in the lower left corner of the matrix T X = X eT to show that x_m,1 = 0, then consider the diagonal i − j = m − 2 (the one just above the lower left corner) of the matrix T X = X eT to show that xm−1,1 = 0 and xm,2 = 0, etc.]