Theory of Matrices. Chapter 5

(1)

Chapter 5

Theory of Matrices

As before, F is a field. We use F[x] to represent the set of all polynomials of x with coefficients in F.

We use M m,n (F) and M m,n (F[x]) to denoted the set of m by n matrices with entries in F and F[x]

respectively. When m = n, we write M m,n as M n .

In this chapter we shall study seven RST (Reflective, Symmetric, Transitive) relations among ma- trices over F:

1. Equivalence over F: B = P AQ, where P and Q are invertible matrices over F;

2. Equivalence over F[x]: B = P AQ, where P and Q are invertible matrices over F[x];

3. Congruence: B = P ^T AP , where P is invertible;

4. Hermitian congruence: B = P ^∗ AP , where F = C and P ^∗ := ¯ P ^T and is invertible;

5. Similarity: B = P ⁻ ¹ AP ;

6. Orthogonal similarity: B = P ⁻ ¹ AP , where F = R and P ⁻ ¹ = P ^T ; 7. Unitary similarity: B = P ⁻ ¹ AP , where F = C and P ⁻ ¹ = P ^∗ .

All of the relations except the first two require the related matrices A and B to be square. Otherwise the products defining the relations are impossible.

The seven relations are in three categories: equivalent, congruent, and similar. For each of the seven relations there are two major problems:

(1) Obtaining a set of canonical matrices;

(2) Finding a necessary and sufficient condition for two matrices to obey the relation.

Equivalence over F originates from solving system of linear equations, by making change of variables and performing operations on equations. Equivalence over F[x] is primarily a tool for the study of similarity and the system of linear constant coefficient ordinary differential equations.

Similarity occurs in the determination of all matrices representing a common linear transformation, or alternatively, in finding basis such that a linear transformation has a simple form.

Congruence originates from non-singular linear substitution in quadratic forms and Hermitian forms.

It can be used to characterize quadratic surfaces.

Finally, if the linear substitution is required to preserve the length, congruence then becomes or- thogonal or unitary similarity.

143

(2)

5.1 RST Relation

Definition 5.1. Let S be a set. A relation on S is a collection R of ordered pairs from S × S. We say x relates y and write x ∼ y if (x, y) ∈ R. We say x does not relate y and write x 6∼ y if (x, y) 6∈ R.

A relation is called reflexive if x ∼ x for all x ∈ S;

A relation is called symmetric if x ∼ y implies y ∼ x;

A relation is called transitive if x ∼ y and y ∼ z implies x ∼ z;

An RST relation ¹ is a reflexive, symmetric, and transitive relation.

Given an RST relation ∼ on S, for each x ∈ S, the set [[x]] := {y ∈ S | y ∼ x} is called the equivalent class of x.

Gives an RST relation, there are two fundamental problems:

1. finding a set of canonical forms, i.e., a set containing one and exactly one representative from each equivalent class.

2. finding convenient criteria to determine whether any given two elements are related or to find the canonical form which an arbitrarily given element relates.

Example 5.1. Consider Z, the set of all integers. We define a “mod(3)” relation by m ∼ n if m − n is an integer muptiple of 3.

This is an RST relation which divides Z is into three equivalent classes:

Z 0 := {3k | k ∈ Z}, Z 1 := {1 + 3k | k ∈ Z}, Z 2 := {2 + 3k | k ∈ Z}.

From each class, we pick-up one and only one representative, say 0 from Z 0 , 1 from Z 1 , and −1 from Z 2 . Then we obtain a set C = {0, 1, −1} of canonical forms. Each integer relates exactly one canonical form.

Criteria: An integer of the form a 1 · · · a ^k (in dicimal) relates the remainder of (a 1 + · · · + a ^k )/3. For example: n = 8230609. Then the remainder of (8 + 2 + 3 + 6 + 9)/3 is 2, so n ∈ Z ² .

Exercise 1. Let S = M n (F), the set of all square matrices of rank n and entries in F. Which kind of relations (reflexive, symmetric, transitive, and/or RST) are the following?

1. A ∼ B if and only if B = 0I; e.g. R = {(A, 0I) | A ∈ S};

2. A ∼ B if and only if A = 0I; e.g. R = {(0I, B) | B ∈ S};

3. A ∼ B if and only if A = B; e.g. R = {(A, A) | A ∈ S};

4. A ∼ B for all A and B; e.g. R = {(A, B) | A, B ∈ S};

5. A ∼ B if and only if both A and B are invertible;

6. A ∼ B if and only if A − B has at least one row of zeros;

7. A ∼ B if and only if AB = 0I.

1

This is commonly called “equivalence relation”, but in this chapter such terminology would be confusing.

(3)

5.1. RST RELATION 145 Exercise 2. Demonstrate by examples that the three relations, reflexive, symmetric, and transitive, are independent to each other; namely, find examples of relations that satisfy (i) non of them, (ii) only one of them, (iii) only two of them, and (iii) all three of them.

Exercise 3. Suppose ∼ is an RST relation on a set S. Show that for every x, y ∈ S, either [[x]] = [[y]] or

[[x]] ∩ [[y]] = ∅.

(4)

5.2 Equivalence over F

Consider a linear system

n

X

j=1

a ⁱ _j x ^j = b ⁱ , i = 1, · · · , m (5.1)

of m equations and n unknowns x 1 , · · · , x ⁿ . Here a ⁱ _j , b ⁱ are given numbers (scalars) in F. This system can be put in a matrix form

Ax = b, A = (a ⁱ _j ) m×n , x = (x ^j ) n×1 , b = (b ⁱ ) m×1 . In this form, the system may be regarded as having only one unknown, the vector x.

Two systems of m-linear equations in n unknowns x 1 , · · · , x ⁿ are called equivalent if every solution of one system is also a solution of the other, and vice versa. Certain simple operations are commonly used to convert a system to an equivalent one that may be easier to solve. For instance, it surely makes no difference when an equation is written first or last or in some intermediate position.

An elementary operation for a system of equations is one of the following three types:

1. Interchange two equations;

2. multiplication of an equation by an invertible scalar;

3. Addition to an equation by a scalar multiple of a different equation.

A system of equations Ax = b can be recorded by an augmented matrix G = (A b).

The elementary operations of equations correspond to the elementary row operations (defined below) on the augmented matrix.

Definition 5.2. 1. An elementary row operation is one of the followings:

(a) O ^ij : Interchange the ith and jth rows;

(b) O ⁱ (d) : Multiplication of the ith row by an invertible scalar d;

(c) O ^ij (c): Addition to the ith row by a scalar c multiple of the jth row, i 6= j.

2. An elementary matrix is a matrix obtained by performing an elementary row operation on an identity matrix I. There are three types of elementary matrices: E ij , E i (d) (d invertible), and E ij (c), obtained by performing O ^ij , O ⁱ (d), and O ^ij (c) on the identity matrix I, respectively.

3. A matrix is called in a Row-Reduced-Echelon-Form ² or RREF, if

(a) every nontrivial row has 1 as its first non-zero entry, called a leading one;

(b) any leading one in a lower row occurs only to the right of its counterpart above it;

(c) In any column containing a leading one, all entries above and below the leading one are zero;

(d) all rows consisting entirely zeros, if any, occurs at the bottom of the matrix.

4. The row space of a matrix is the space spanned by all the row vectors of the matrix.

2

If F is not a field, say F = Z, the definition of RREF should be revised.

(5)

5.2. EQUIVALENCE OVER F 147

5. Two matrices are called row equivalent if they have the same row spaces.

6. The row rank of a matrix is the dimension of the row space of the matrix.

Lemma 5.1. 1. Every elementary matrix has an inverse, which is also elementary.

2. To perform an elementary row operation O on an m×n matrix A, calculate the product EA, where E is the matrix obtained by performing O on I m , the identity matrix of rank m.

3. The row space of a matrix B ∈ M ^r,n (F) is a subspace of that of A ∈ M ^m,n (F) if and only if B = P A for some P ∈ M ^r,m (F).

Proof of 3. Write

A =





 a ¹

.. . a ^m





 , B =





 b ¹

.. . b ^r





 . Then

row space of A = span {a ¹ , · · · , a ^m };

row space of B = span {b ¹ , · · · , b ^r }.

Hence, the row space of B is a subspace of the row space of A if and only if there exist scalars p ⁱ _j , i = 1, · · · , r, j = 1, · · · , m such that

b ⁱ =

m

X

j=1

p ⁱ _j a ^j ∀ i = 1, · · · , r;

namely, B = P A where P = (p ⁱ _j ) r×m .

By using the Gauss elimination process, one can prove the following:

Lemma 5.2. 1. Any matrix over a field can be transformed by elementary row operations to an RREF, which is unique. In particular, if the matrix is invertible, its RREF is the identity matrix.

2. Two matrices A, B ∈ M ^m,n (F) are row equivalent if and only if B = P A for some invertible P ∈ M ^m (F); that is, A and B have the same row space.

3. A square matrix is invertible if and only if it is row equivalent to an identity matrix, if and only if it is a product of elementary matrices, and also if and only if its row vectors form a basis of F ⁿ . 4. If a matrix A is reduced to an identity matrix by a succession of elementary row operations, the

same succession of row operation performed on the identity matrix produces A ⁻ ¹ : (A I) −→ (I A ⁻ ¹ ).

Proof of 4. Suppose {O ¹ , · · · , O ^k } is a succession of elementary row operations that transform A to I, then I = E k . . . E 1 A where each E i is the matrix obtained by performing O ⁱ on the identity matrix I.

Hence, A ⁻ ¹ = E k · · · E ¹ = E k · · · E ¹ I; that is A ⁻ ¹ can be obtained by performing successively the row operations O ¹ , · · · , O ^k on I.

Corollary 5.1. A system of simultaneous linear equations Ax = b has a solution x if and only if the

rank of the augmented matrix (A b) equals the rank of the coefficient matrix A.

(6)

Analogous to the elementary operation on equations, we can perform elementary operation on variables x 1 , · · · , x n ; In particular, if we make a change of variable y = Qx where Q ∈ M n (F) is invertible, then the system Ax = b is equivalent to AQy = b.

Note that in recording the coefficient matrix of systems of linear equation, elementary operations on variables corresponding to column operations.

The column space of a matrix over a field is the space generated by all the column vectors of the matrix. The column rank of a matrix is the dimension of the column space of the matrix. Two matrices of same size are called column equivalent if they have the same column space.

The elementary column operations are the interchange of two columns, the multiplication of a column by an invertible scalar, and the addition to one column by a scalar multiple of a different column.

It is easy to show the following:

Lemma 5.3. (i) Each elementary column operation on an m × n matrix can be achieved by multiplying the matrix from the right by an elementary matrix obtained by performing the same operation on I n .

(ii) Elementary column operations do not change the column space of matrices.

(iii) The column space of a matrix B ∈ M ^m,r (F) is a subspace of that of a matrix A ∈ M ^m,n (F) if and only if B = AQ for some Q ∈ M n,r (F).

(iv) Two matrices A and B are column equivalent if and only if B = AQ for some invertible Q.

(v) If a succession of elementary column operations transforms A to I, then the same succession of column operations transforms I to A ⁻ ¹ :

A I

−→

I A ⁻ ¹

Definition 5.3. Two matrices A, B of same size are called equivalent over F if B = P AQ where both P and Q are invertible matrices over F.

Note that if A, B ∈ M ^m,n (F), then P ∈ M ^m (F) and Q ∈ M ⁿ (F).

Equivalence is but one of many RST relations to be studied in this chapter. The following theorem provides answers for the basic questions regarding the equivalence relation:

1) What is the necessary and sufficient condition for two matrices to be equivalent?

2) What are the canonical forms?

Theorem 5.1. 1. Every matrix A ∈ M ^m,n (F) is equivalent to a matrix in which for some integer r ≥ 0, the first r elements on the diagonal are 1 and all other elements of the matrix are 0, namely,

A ∼ I r 0 0 0

, where I r is the identity matrix of rank r.

2. For any matrix over a field, its row rank is the same as its column rank, and is hence called the rank of the matrix.

3. Two matrices of same size over a field are equivalent if and only if they have the same rank.

4. Let A ∈ M ^m,n (F). Suppose a succession of elementary operations on the first m rows and n columns gives

A I m

I n

−→ D P

Q

.

Then D = P AQ.

(7)

5.2. EQUIVALENCE OVER F 149 Two applications of the above theorem can be found in the exercise. For convenience, we use the following definition:

Definition 5.4. Let A : U → V be a linear map from vector space U to vector space V , {u ¹ , · · · , u ⁿ } a basis of U , and {v ¹ , · · · , v ^m } a basis of V . Suppose A ∈ M ^m,n (F) is a matrix such that

A(u 1 · · · u ⁿ ) := (Au 1 · · · Au ^m ) = (v 1 · · · v ^m )A.

Then, A is called the matrix of A under the basis {u ¹ , · · · , u ^m } (of the definition domain) and the basis {v ¹ , · · · , u ⁿ } (of the image domain).

Exercise 4. Show that the RREF of a matrix is unique.

Exercise 5. Find matrices P 1 , P and Q such that P 1 A is an RREF and P AQ is in its canonical form, where A is one of the followings:

1 1 0 2 3 0

,





0 1 1 0 0 1 1 1 0 1 0 0





Exercise 6. If x 1 , · · · , x ^m and y ¹ , · · · , y ⁿ are two sets of variables, a quadratic polynomial of the type q =

m

X

i=1 n

X

j=1

x i a ⁱ _j y ^j

is called a bilinear form in these two sets of variables. In matrix notation, it can be written as q = xAy, x = (x i ) 1×m , A = (a ⁱ _j ) m×n , y = (y ^j ) n×1 .

Show that there exist invertible changes of variables x i =

m

X

k=1

t k p ^k _i , i = 1, · · · , m, y ^j =

n

X

l=1

q _l ^j s ^l j = 1, · · · , n,

i.e., x = tP and y = Qs, where P and Q are invertible matrices, such that q =

r

X

k=1

t k s ^k

where r is the rank of the matrix A.

Exercise 7. Let A : U → V be a linear map and A be the matrix of the map under bases {u ¹ , · · · , u ⁿ } of U and {v ¹ . · · · , v ^m } of V . Suppose Ax = y. Denote by x = (x ¹ · · · x ⁿ ) ^T ∈ F ⁿ and by y = (y ¹ · · · y ^m ) ^T ∈ F ^m the coordinates of x and y under the bases; i.e. x = P n

j=1 u j x ^j and y = P m

i=1 v i y ⁱ . Show that y = Ax

Exercise 8. Consider a linear transformation A : x ∈ F ⁿ → Ax ∈ F ^m , where A ∈ M ^m,n (F). Show that there are basis {u ¹ , · · · , u ⁿ } of F ⁿ and basis {v ¹ , · · · , v ^m } of F ^m such that

A(u i ) = v i ∀i = 1, · · · , r, Au i = 0 ∀i = r + 1, · · · , n

where r is the rank of the matrix A. That is, the matrix of the linear map A under the basis {u ¹ , · · · , u ⁿ } and {v ¹ , · · · , v m } is I r 0

0 0

.

Consequently, the range of A is span {v ¹ , · · · , v r } and the null space (kernel) of A is span{u ^r+1 , · · · , u n }.

(8)

Exercise 9. Show that if A is reduced to an identity matrix by a succession of of column operations, the same succession of column operations performed on the identity matrix produces A ⁻ ¹ .

Also prove Theorem 5.1 (4).

Exercise 10. Consider the linear system Ax = b where A ∈ M ^m,n (F) and b ∈ F ^m are given and x ∈ F ⁿ is the unknown. Show that there are invertible matrices P ∈ M ^m (F) and Q ∈ M ⁿ (F) such that after the invertible change of variables z := (z ¹ · · · z ⁿ ) ^T = Qx, the system is equivalent to

z ⁱ = ¯b ⁱ ∀ i = 1, · · · , r, 0 = ¯b ^j ∀ j = r + 1, · · · , m, where ¯ b = (b ¹ · · · ¯b ^m ) := P b and r is the rank of A.

Consequently, the following holds:

(i) If r = m or r < m and 0 = ¯b ^j for all j = r + 1, · · · , m, then the system Ax = b is solvable, and all solutions are given by

x =

r

X

i=1

q i b ⁱ +

n

X

j=r+1

q j c ^j

where q 1 , · · · q ⁿ are the column vectors of Q ⁻ ¹ , and c ^r+1 , · · · , c ⁿ are arbitrary constants.

In particular, if r = n, then the solution is unique.

(ii) If r < m and ¯b ^j 6= 0 for some j ∈ {r + 1, · · · , m}, then the system Ax = b is not solvable.

Exercise 11. Prove that the equivalence relation divides M m,n (F) into 1 + min {m, n} equivalent classes.

Exercise 12. Let x 1 · · · , x ^k be column vectors in F ^m and y ¹ , · · · , y ^k be row vectors in F n . Define

A =

k

X

i=1

x i y ⁱ = (x 1 · · · x ^k )





 y ¹

.. . y k





 ∈ M m,n (F).

Prove the following:

(i)the row space of A is a subspace of span {y ¹ , · · · , · · · y ^k }; in addition, if {x ¹ , · · · , x ^k } is linearly independent, then the row space of A is equal to span {y ¹ , · · · , y ^k }.

(ii) the column space of A is a subspace of span {x ¹ , · · · , x ^k }; in addition, if {y ¹ , · · · , y ^k } is linearly independent, then the column space of A is equal to span {x ¹ , · · · , x ^k }.

Exercise 13. ^∗ Find invertible matrices P ∈ M ² (Z) and Q ∈ M ³ (Z) such that P AQ = d 1 0 0 0 d 2 0

and d 1 divides d 2 , for

A = 3 8 10 4 9 20

, A = 14 18 12 22 6 30

.

(9)

5.3. CONGRUENCE 151

5.3 Congruence

A quadratic form of n variables x ¹ , · · · , x ⁿ is a degree 2 polynomial

q =

n

X

i,j=1

x ⁱ a ij x ^j = x ^T Ax = ¹ ₂ x ^T (A ^T + A)x, x = (x ⁱ ) n×1 , A = (a ij ) n×n .

Under a linear invertible change of variables

x ⁱ =

n

X

k=1

p ⁱ _k y ^k , i = 1, · · · , n, i.e. x = P y

the quadratic form becomes

q = x ^T Ax = y ^T By, B = P ^T AP.

Definition 5.5. Two matrices A and B are called congruent if B = P ^T AP for some invertible P . Notice that for two matrices to be congruent, they have to be square and of same dimension.

Congruence is an RST relation on square matrices. In addition, congruence is a sub-relation of the equivalence relation, so the ranks of congruent matrices are equal.

Since the congruence relation originates from quadratic forms, most of the known results about congruence are confined to symmetric matrices.

Theorem 5.2. 1. Two matrices are congruent if and only if one is obtained from the other by a succession of pair of elementary operations, each pair consisting of a column operation and the corresponding row operation. In each pair either operation can be performed first.

The succession of elementary operations can be recorded as, for P = E 1 · · · E ^k ,

A I I

−→ E _k ^T · · · E 1 ^T AE 1 · · · E ^k E _k ^T · · · E 1 ^T I IE 1 · · · E ^k

= P ^T AP P ^T P

.

2. Over any field in which 1 + 1 6= 0, each symmetric matrix is congruent to a diagonal matrix in which the number of non-zero diagonal entries is the rank of A.

3. A complex symmetric matrix of rank r is congruent to a matrix I r 0 0 0

. 4. A real symmetric matrix is congruent over R to a matrix





I p 0 0

0 −I ^q 0

0 0 0





where (p, q), called the index pair of A, is uniquely determined by A.

Proof. 1. Each invertible matrix P is a product of elementary matrices: P = E 1 E 2 · · · E k . Thus, B = P ^T AP if and only if

B = E _k ^T [ · · · (E ^T 1 AE 1 ) · · · ]E ^k .

For each elementary column operation effected by E i as a right factor the corresponding row operation

is effected by E _i ^T as a left factor. Hence, B = P ^T BP if and only if B is obtained from A by performing

a succession of pairs of elementary operations. The final statement of the first assertion of the theorem

merely amounts to the associative law E _i ^T (CE i ) = (E _i ^T C)E i .

(10)

Note that P = IE 1 · · · E k is obtained by applying successively the corresponding column operations on I, and P ^T = E k · · · E 1 ^T I is obtained by applying successively the corresponding row operations on I.

2. If A = 0I, then the desired matrix is A itself. Hence we assume that A = A ^T 6= 0I.

Next we show that A is congruent to a matrix B with a non-zero diagonal entry. This is true if A has a non-zero diagonal element, since we can take B = A = I ^T AI. Hence, we assume that a jj = 0 for all j. Since A 6= 0I, there is a ij = a ji 6= 0 for some i 6= j. Addition of column j to column i followed by addition of row j to row i replace A by a congruent matrix B = (b kl ) n×n in which b ii = a ii + a jj + a ji + a ij = 2a ij 6= 0.

An interchange of column 1 and column i in B followed by the corresponding row operation replace B by C = (c kl ) n×n with c 11 = b ii 6= 0.

Now we add to column j the product of the first column by −c ^1j /c 11 , then perform the corresponding row operation. Since c 1j = c j1 , this makes the new elements in the (1, j) and (j, 1) places both 0. Doing this for j = 2, · · · , n, we replace C by a matrix c 11 0

0 A 1

where A 1 is a symmetric (n − 1) × (n − 1) matrix. (Is computation necessary to verify the symmetry of A 1 ?)

The same procedure may be applied to A 1 with operations not affecting the first row or column.

After finite number of steps, there appears a diagonal matrix congruent to A. Preservation of rank is a consequence of the fact that congruence is a special case of equivalence.

3. The third assertion follows from the second assertion, since every complex number has a square root.

4. From 2, A = P ^T DP where D = diag(d 1 , · · · , d ^r , 0, · · · , 0) where each d ⁱ is non-zero. Let p be the number of d h which are positive. By suitable interchange of rows followed by the corresponding column interchanges, these p elements may be brought into the first p position. Let c i = p|d ⁱ | for i = 1, · · · , r.

For each i = 1, · · · , r, multiplication the ith column by 1/c i followed by the corresponding row operation, D then is then congruent to the matrix in the theorem.

It remains to show that (p, q) is unique. Suppose A is also congruent to another diagonal matrix with index (p ⁰ , q ⁰ ). If p 6= p ⁰ , it is a matter of notation to assume that p ⁰ < p.

Consider the quadratic form f = x ^T Ax. Under invertible linear substitution x = P y, x = Qz, we obtain

f = (y ¹ ) ² + · · · + (y ^p ) ² − (y ^p+1 ) ² − · · · − (y ^r ) ² = (z ¹ ) ² + · · · + (z ^p

⁰

) ² − (z ^p

⁰

⁺¹ ) ² − · · · − (z ^r ) ² where r = p + q = p ⁰ + q ⁰ is the rank of A. As y = P ⁻ ¹ x and z = Q ⁻ ¹ x, the system of equations

z ⁱ = 0, y ^j = 0, h = 1, · · · , p ⁰ , j = p + 1, · · · , n

can be regarded as simultaneous linear, homogeneous equations in x ¹ , · · · , x ⁿ . There are n unknowns and p ⁰ + (n − p) < n equations, hence there exists a non-trivial solution x ⁰ . Set y 0 = P x 0 = (y ^k ₀ ) and z 0 = Qx 0 = (z ^k ₀ ). Then the value of f at x 0 can write as

f = x ^T ₀ Ax 0 =

p

X

k=1

(y ^k ₀ ) ² = −

r

X

k=p

⁰

+1

(z ₀ ^k ) ²

from which we conclude that y ⁱ ₀ = 0 for all i = 1, · · · , p, which implies that y ⁰ = 0, and consequently, x 0 = P ⁻ ¹ y 0 = 0, a contradiction. Thus p = p ⁰ .

A matrix A is called skew or skew-symmetric if A = −A ^T . There are many reasons for interest

in skew matrices. One is that any square matrices A can be written uniquely as A = S + K where S is

symmetric and K is skew. Indeed, S = ¹ ₂ (A + A ^T ) and K = ¹ ₂ (A − A ^T ).

(11)

5.3. CONGRUENCE 153 Although there is no good result on congruence for general square matrices, it is interesting and surprising to find that we can with very little effort to get a canonical set for skew symmetric matrices.

We first note that a matrix congruent to a skew matrix is also skew:

(P ^T AP ) ^T = P ^T A ^T P = −P ^T AP if A ^T = −A.

Also we notice that if 2 6= 0, then all diagonal entries of a skew matrix is zero.

Theorem 5.3. Assume that F is a field in which 1 + 1 6= 0.

1. Each skew matrix over F is congruent over F to a matrix B = diag(E, · · · , E, 0) where 0 is a zero matrix of suitable size (possibly absent) and E = 0 1

−1 0

.

2. Two n × n skew matrix over F is congruent over F if and only if they have the same rank.

Proof. If A = 0I, we take B = A. Otherwise, there is a ij = −a ^ji = c 6= 0 where i 6= j.

Multiplication of row j and column j by 1/c we can assume that a ij = 1.

Interchange of row i with row 1, of column i with column 1, of row j with row 2, of column j with column 2, A is then congruent to a skew matrix C =

E C 1

−C 1 ^T C 2

.

It is clear that the element −1 in E may be exploited by row operations to make the first column of −C ¹ zero; the corresponding column operations makes the first row of C 1 zero. Likewise, the second column of −C ¹ can be made zero by use of the +1 in E, and the second row of C 1 can be made zero.

Thus, A is congruent to E 0 0 A 1

.

An induction then completes the proof.

Example 5.2. Find an invertible matrix P such that P ^T AP is triangular and equivalent to A = 0 2 1 0

. Solution. For the matrix A I

I

we perform successively row operations and corresponding column operations on the first n rows and columns. The following records the change of the first n rows: (A I) =

0 2 1 0

1 0 0 1

−→ 1 2 1 1

1 0 0 1

−→ 3 2 1 1

1 0 0 1

−→ 3 2 1 1

0 −2/3 −1/3 2/3

−→ 3 1 1 1

0 −2/3 −1/3 2/3

−→

√ 3 1/ √

3 1/ √

3 1/ √ 3

0 −2/3 −1/3 2/3

−→ 1 1/ √

3 1/ √

3 1/ √ 3

0 −2/3 −1/3 2/3

Hence, set P ^T = 1/ √ 3 1/ √

3 −1/3 2/3

, i.e. P = 1/ √

3 −1/3 1/ √

3 2/3

we have P ^T AP = 1 1/ √ 3 0 −2/3

.

Exercise 14. For E = 0 1

−1 0

, show that E ² = −I.

Exercise 15. Use the scheme in the proof transform congruently the following matrices into triangular forms:

1 2 2 1

,





0 1 2

10 0 3

2 3 0





(12)

Exercise 16. For the following quadratic forms, find a change of variable y = P x such that they become diagonal forms.

(a) f = xy + yz, where (x, y, z) ^T ∈ R ³ ; (b) f = x ² − y ² + 2ixy, where (x, y) ^T ∈ C ² ;

(b) g = ax ² + 2bxy + cy ² , where (x, y) ∈ R ² , a, b, c ∈ R, (c) h =

P n

i=1 a i x ⁱ

P n

j=1 b j x ^j

where (x ⁱ ) ∈ R ⁿ . Exercise 17. Prove that if A is skew, then A ² is symmetric.

Exercise 18. Find a invertible matrix P such that P ^T AP is in the canonical form, where A is one of the following





0 1 2

−1 0 3

−2 −3 0



 ,







0 1 1 1

−1 0 1 0

−1 −1 0 −2

−1 0 2 0







, 0 i

−i 0

Exercise 19. Show that congruence (over R) relation divides all n × n real symmetric matrices into

1 2 (n + 1)(n + 2) equivalent classes.

(13)

5.4. HERMITIAN CONGRUENCE 155

5.4 Hermitian Congruence

In defining inner product on C ⁿ , we need to work on Hermitian forms, q =

n

X

i=1 n

X

j=1

¯

z ⁱ a ij z ^j = z ^∗ Az z = (z ⁱ ) ∈ C ⁿ , z ^∗ := ¯ z ^T , A = (a ij ) ∈ M n (C).

Note that a change of variable w = P x gives

q = z ^∗ Az = y ^∗ By, B = P ^∗ AP.

Definition 5.6. Two square matrices A and B in M n (C) are called Hermitely congruent if B = P ^∗ AP for some invertible P ∈ M ⁿ (C), where P ^∗ := ¯ P ^T .

A matrix A ∈ M n (C) is called Hermitain if A ^∗ = A.

A matrix A ∈ M n (C) is called Skew-Hermitian if A ^∗ = −A, i.e., iA is Hermitian.

A matrix A ∈ M m (C) is called positive definite if z ^∗ Az > 0 for all non-trivial z ∈ C ⁿ .

A matrix A ∈ M m (R) is called positive definite if A = A ^T and x ^T Ax > 0 for all non-trivial x ∈ R ⁿ .

Note that the value of a Hermitian form q = z ^∗ Az is real for all z ∈ C ⁿ if and only if A is Hermitian.

Hence, the two definitions of positive definite are consistent.

Also note that a matrix that is Hemitely congruent to a Hermitian matrix is also Hermitian.

The reduction theory for Hermitian matrices under Hermitely congruence is an almost perfect analogues of the theory for symmetric matrices relative to congruence, and the proof requires only minor changes.

Theorem 5.4. 1. Two matrices in M n (C) are Hemitely congruent if and only if one is obtained from the other by a succession of pairs of elementary operations, each pair consisting of an ele- mentary column operation and the conjunctively corresponding row operation; namely, A and B are Hermitely congruent if and only if

B = E _k ^∗ [ · · · (E 1 ^∗ AE 1 ) · · · ]E ^k

where E 1 , · · · , E ^k are elementary matrices. In particular, B = P ^∗ AP where P ^∗ = E _k ^∗ · · · E ^∗ 1 I is obtained by performing successively row operations O 1 ^∗ , · · · , O k ^∗ on I, and P = IE 1 · · · E ^k is obtained by performing successively column operations O ¹ , · · · , O ^k on I.

2. Every Hermitian matrix is Hermitely congruent to a matrix of the form





I p 0 0

0 −I ^q 0

0 0 0





where the integer index pair (p, q) is uniquely determined by A.

3. A matrix A is positive definite if and only if it is Hermitely congruent to an identity matrix;

namely, A = P ^∗ P where P is an invertible matrix.

4. Every skew-Hermitian matrix is Hermitely congruent to a matrix of the form





iI p 0 0 0 −iI ^q 0

0 0 0





where the integer index pair (p, q) is uniquely determined by A.

(14)

Exercise 20. Prove Theorem 5.4.

Exercise 21. Let A ∈ M ⁿ (C). In C ⁿ , define a sesqui-bilinear form hx, yi := y ^∗ Ax ∀ x, y ∈ C ⁿ .

(i) Show that h·, ·i is an inner product if and only if A is positive definite.

(ii) Suppose A = P ^∗ P where P ∈ M ⁿ (C) is invertible. Write P ⁻ ¹ = (v 1 · · · v ⁿ ). Show that {v ¹ , · · · , v ⁿ } is an orthonormal basis under the inner product h·, ·i.

Exercise 22. Suppose A = B + iC where A ∈ M ⁿ (C) is Hermitian, and B, C ∈ M ⁿ (R). Show that B is symmetric and C is skew-symmetric.

Exercise 23. Let A ∈ M ⁿ (C). Show that A can be written uniquely as A = B + iC where both B and C are Hermitian.

Exercise 24. Find invertible P ∈ M n (C) such that P ^∗ AP is in its canonical form, for the following A’s:

0 1

−1 0

, 0 i

i 0

, 2 i

−i 2

Exercise 25. Define an inner product on C by hx, yi := y ^∗ Ax where A is as the last matrix in the previous exercise. Find an orthonormal basis.

Exercise 26. Let A ∈ M ⁿ (C). (i) Show that if x ^∗ Ax = 0 for all x ∈ C, then A = 0I.

(ii) Sow that A is Hermitian if and only if x ^∗ Ax ∈ R for all x ∈ C ⁿ . Exercise 27. ^∗ Let A be a Hermitian matrix.

(i) Show that any principal sub-matrix of A (a square matrix obtained from A by deleting some rows and corresponding columns) is also Hermitian.

(ii) Using an induction argument on the dimension of A show that A is positive definite if and only if det(S j ) > 0 for all j = 1, · · · , n where S ^j is the submatrix of A taken from the first j rows and columns.

Hint: Suppose P ^∗ AP = I. Then

P ^∗ 0 0 ^∗ 1

A B B ^∗ c

P 0 0 ^∗ 1

=

I P ^∗ B B ^∗ P c

I 0

−B ^∗ P 1

I P ^∗ B B ^∗ P c

I −P ^∗ B

0 ^∗ 1

= I 0 0 ^∗ d

where d = det(P )

2 det A B B ^∗ c

.

(15)

5.5. POLYNOMIALS OVER A FIELD 157

5.5 Polynomials over a Field

Let F be a field and x be an abstract symbol. A polynomial in the symbol x with coefficient in F is an expression

p(x) = c n x ⁿ + c n−1 x ⁿ⁻¹ + · · · + cx + c ⁰ x ⁰ .

where n is a non-negative integer and each c i is in F. If all c i are zero, we call it the zero polynomial, and is denoted by 0. If c n 6= 0, then c ⁿ is called the leading coefficient of p(x) and n is called the degree of p and write

n = deg(p(x)).

If its lading coefficient is 1, p(x) is called monic. The degree of a non-zero constant (times x ⁰ ) is zero, and the degree of a zero polynomial is defined as −∞.

The set of all polynomials in x with coefficients in F is denoted by F[x] (read “ F bracket x”). If we abandon the notion of x as variable, we may regard each p(x) as a single, fixed entity, a member of the set F[x]. We equip F[x] with the conventional summation and multiplications,

1 · x = x, 0 · x = 0, cx = xc, x ⁱ · x ^j = x ^i+j , P

i c i x ⁱ + P

j d j x ^j = P

k (c k + d k )x ^k , P c i x ⁱ P d j x ^j = P

i,j c i d j x ^i+j .

Then F[x] becomes a commutative ring, which we call it a polynomial domain over F.

Remark: (i) If x is a variable taking values in F, then x ⁰ = 1 ∈ F and 0 = 0.

(ii) If x is a variable taking values in M n (F), than x ⁰ = I and 0 = 0I.

(iii) In default, x is an abstract symbol, and therefore so are x ⁿ . For simplicity, we abuse 0 for 0 and c for cx ⁰ .

Example 5.3. Let F = {0, 1} equipped with the mod (2) integer arithmetic. Consider the polynomial f (x) = x ² − x = x(x − 1).

If we regard x as a variable in F, then f (x) ≡ 0. However, we regard x as an abstract symbol, so that f (x) 6= 0 and deg(f) = 2.

Since x is regarded as a symbol, for any field F, we have the following: For any non-zero polynomial p(x) and q(x),

deg(pq) = deg p + deg(q)

This property implies that if f (x)g(x) = 0 then either f (x) = 0 or g(x) = 0. Note that the definition that the degree of zero polynomial is −∞ is consistent with the above degree formula. Also, we have a cancellation law:

If h(x)g(x) = f (x)g(x), g(x) 6= 0, then h(x) = f(x).

Given polynomials f (x) and g(x) over F, we say g(x) divides f (x), or g(x) is a factor of f (x) if f (x) = g(x)h(x) for some h(x) ∈ F[x]. In such a case, we write g|f. Note that g|0 for any g.

An expression f (x) = g(x)h(x) is called a factorization of f (x). A factorization of two factors, of which one is a constant, will be called a trivial factorization.

Definition 5.7. 1. A non-constant polynomial is called irreducible over F is all of its factorizations

into two factors over F are trivial.

(16)

2. If a polynomial g(x) divides both f 1 (x) and f 2 (x), g(x) is called a common divisor of f 1 (x) and f 2 (x).

3. A monic polynomial is called the greatest common divisor, or simply gcd, of f 1 (x) and f 2 (x) if (a) g(x) is a common divisor of f 1 (x) and f 2 (x) and (b) every common divisor of f 1 (x) and f 2 (x) is a divisor of g(x).

4. Two polynomials are called relatively prime if their gcd is unit.

Note that every degree 1 polynomial is irreducible. In Q[x], x ² − 2 is irreducible. In R[x], x ⁴ + 1 is also irreducible. Also, we take the default that the gcd of a non-zero polynomial f and the zero polynomial is cf , where c is a scalar so that cf is monic. Also, the gcd of two zero polynomials is zero.

The familiar process of long division of polynomials is valid over an arbitrary field. The following lemma, known as divisor algorithm, is a careful formulation of the result of dividing as far as possible.

Lemma 5.4. If f (x) and g(x) are polynomials over F and g(x) 6= 0, then there are unique polynomials q(x) and r(x) over F such that

f (x) = q(x)g(x) + r(x), either r(x) = 0 or deg(r(x)) < deg(g(x)).

One notices that if f (x) divides by x − c (indeed x − cx ⁰ ), the remainder is f (c) (indeed, f (c)x ⁰ ).

Consequently, x − c divides f(x) if and only if f(c) = 0, i.e. c is a root of f(x) = 0.

Theorem 5.5. If both f 1 (x) and f 2 (x) are non-zero polynomials over F, they have a greatest common divisor d(x) in F[x]. Moreover, d(x) is unique and is expressible as

d(x) = p 1 (x)f 1 (x) + p 2 (x)f 2 (x) where p 1 (x), p 2 (x) ∈ F[x].

In particular, f (x) and g(x) is prime to each other if and only there are a(x) and b(x) in F [x] such that

1 = a(x)f (x) + b(x)g(x).

We shall use the notation

d(x) = gcd[f 1 (x), f 2 (x)]

for the greatest common divisor of f 1 (x) and f 2 (x). The gcd can be found by the following algorithm:

f 1 = q 1 f 2 + f 3 , deg(f 3 ) < deg(f 2 ), f 2 = q 2 f 3 + f 4 , deg(f 4 ) < deg(f 3 ),

· · · ·

f k = q k f k+1 + f k+2 , deg(f k+2 ) < deg(f k+1 ), f k+1 = q k+1 f k+2 .

Then d(x) = cf k+2 (x) where c is a constant. One can use these equations to express f k+2 as a linear combination of f 1 and f 2 .

Theorem 5.6. Every non-zero polynomial f (x) over F is expressible in the form f (x) = cp 1 (x) · · · p ^r (x)

where c is a non-zero constant and each p i (x) is a monic, irreducible polynomial of F[x]. The above

factorization is unique apart from the order in which the factors of p i (x) appear.

(17)

5.5. POLYNOMIALS OVER A FIELD 159 The above theorem is usually called the unique factorization theorem.

Although irreducibility of a polynomial depend on the size of a field, the concept of greatest common divisor and therefore also relative prime are independent of the field used.

Theorem 5.7. Let F and K be two fields sauch that F ⊂ K. Suppose f(x), g(x) ∈ F[x]. Then the their gcd within F[x] is also their gcd in K[x].

In particular, if f (x) and g(x) are prime to each othr in F[x], then they are also prime to each other in K[x].

Proof. Let d(x) be the gcd of f and g in K[x] and D(x) be their gcd in K[x]. Since F ⊂ K, d(x)|D(x).

Let a(x) and b(x) in F[x] be such that

d(x) = a(x)f (x) + b(x)g(x).

Also let p(x), q(x) ∈ K[x] be such that f(x) = p(x)D(x) and g(x) = q(x)D(x). Then d(x) = a(x)p(x)D(x) + b(x)q(x)D(x) = h(x)D(x)

where h(x) ∈ K[x]. Since d(x)|D(x), we must have deg(h) = 1, namely, h is a constant in K. As both D(x) and d(x) are monic polynomials, we must have h(x) = 1. Hence, d(x) = D(x).

Exercise 28. Define the greatest common divisor for multiple non-trivial polynomials f 1 (x), · · · , f ^k (x).

Show that d(x) = gcd[f 1 (x), · · · , f ^k (x)] exists, and there exists a 1 (x), · · · , a ^k (x) such that d(x) = a 1 (x)f 1 (x) + · · · + a ^k (x)f k (x).

Exercise 29. (1) Suppose p(x) is prime to each of f 1 (x), · · · , f ^k (x). Show that p(x) is prime to Π ^k _i=1 f i (x).

(2) Suppose p is irreducible and p |f ¹ · · · f k . Show that p |f i for some i ∈ {1, · · · , k}.

Exercise 30. Let R = {m + n √

5i | m, n ∈ Z}, equipped with the arithmetic of complex numbers. Show that R is a ring, i.e. R is closed under summation, negation, and multiplication. Also show that 1 + √

5i, 1 − √

5i, 2, 3 are all prime numbers. Nevertheless, 6 = 2 ∗ 3 = (1 + √

5i)(1 − √

5i), so that R is not a unique factorization ring.

Exercise 31. Find a, b such that gcd[f 1 , f 2 ] = af 1 + bf 2 where f 1 = x ⁴ − 3x ² + 3 and f 2 = x ³ + x ² + x + 1.

Exercise 32. Suppose both f and g are non-zero polynomials. Show that there exist a, b such that deg(b) <

deg(f ), deg(a) < deg(g) and gcd[f, g] = af + bg.

(18)

5.6 Equivalence over F [x]

Many of the fundamental and practical results in the theory of matrices occur in the study of similar matrices, and in related subjects. Though matrices in question may have real number entries, the proofs of theorems, and sometimes the theorem too, involve certain other matrices whose entries are polynomials. It is largely for this reason that we study matrices with polynomial entries. The main theme in this section is the theory of equivalence relations over F[x].

In this section, we consider matrices A = A(x) in M m,n (F[x]), i.e., rectangular matrices with entries in F[x].

Definition 5.8. 1. Elementary operation on a matrix over F[x] are defined as follows:

(a) Interchange of two rows (or two columns).

(b) Multiplication of a row (or column) by a non-zero constant (scalar).

(c) Addition to one row (or column) by a polynomial multiple of another row (or column).

Elementary matrices over F[x] are those that obtained from I after a single elementary operation on matrix over F[x].

2. A square matrix A over F[x] is called invertible over F[x] if there is a matrix B over F[x] such that AB = BA = I.

3. Two matrices A and B over F[x] are called equivalent over F[x] if B = P AQ where P and Q are matrices over F[x] that are invertible over F[x].

First we study invertible matrices; namely, those matrices that are equivalent over F[x] to the identity matrix.

Lemma 5.5. 1. Every elementary matrix over F[x] has an inverse which is also elementary.

2. A matrix over F[x] has an inverse if and only if its determinant is a non-zero constant.

3. Any matrix over F[x] that is invertible over F[x] if and only if it is the product of elementary matrices.

4. If a succession of elementary row operations over F[x] reduced a matrix into an identity matrix, then the same succession of operation on the identity matrix produces the inverse of the matrix.

5. Two matrices over F[x] are equivalent over F[x] if and only if one can be obtained from the other by a succession of elementary operations.

Note that det(A) 6= 0 for square matrix over F[x] does not imply that A is invertible; for example, for A = xI ∈ M ⁿ (F[x]), D(A) = x ⁿ 6= 0, but A is not invertible in M ⁿ (F[x]).

Next, we introduce sub-matrix and sub-determinant.

Definition 5.9. Let A be a m × n matrix. A sub-matrix of A is a matrix obtained from A by deleting some rows and columns. A principal sub-matrix is a sub-matrix whose diagonal entries are also diagonal entries of A.

A t × t sub-determinant of A is the determinant of a t × t sub-matrix of A. A t × t principal

sub-determinant is the determinant of a t × t principal sub-matrix.

(19)

5.6. EQUIVALENCE OVER F[X] 161 Lemma 5.6. Let A, B ∈ M m,n (F[x]) be two matrices equivalent over F[x]. Then

(1) Every r × r sub-determinant of A is a linear combination, with coefficients in F[x], of all r × r sub-determinants of B;

(2) the gcd of all sub-determinants of A equals the gcd of all sub-determinants of B.

Proof. (1) Suppose A = EB where E is an elementary matrix over F[x], then E has three types.

Performing the calculation for each of the three types, one can conclude that each r × r sub-determinant of A is a linear combination, with coefficients in F[x], of all r × r sub-determinants of B.

Similarly, the above assertion holds when A = BE and E is an elementary matrix.

As every invertible matrix can be written as a product of elementary matrices, the first assertion thus follows.

(2) Let d(x) and D(x) be, respectively, the gcd of all r × r sub-determinants of A and B. Then d(x) is a linear combination of all r × r sub-determinants of A, and by (1), is also a linear combination of all r × r sub-determinants of B, and hence, D|d since D divides every r × r sub-determinants of B. As B is also equivalent to A, we also have D |d. Hence, d = D.

Theorem 5.8. Each matrix A over F[x] is equivalent over F[x] to a matrix S = diag(f 1 (x), · · · , f ^r (x), 0)

where f 1 (x), · · · , f ^r (x) are monic polynomials satisfying f 1 (x) |f ² (x) | · · · |f ^r (x), and 0 is a m − r by n − r zero matrix.

In addition, such S = S(x), called the Smith’s canonical matrix of A = A(x), is unique, and for all t = 1, · · · , r,

g t (x) := Π ^t _j=1 f j (x)

is the gcd of all t × t sub-determinants of A; in particular, denoting g ⁰ = 1, f t (x) = g t (x)

g t−1 (x) ∀t = 1, · · · , r.

The polynomials f 1 (x), · · · , f r (x) are called the invariant factors of A.

Two matrices in M m,n (F[x]) are equivalent over F[x] if and only if they have the same invariant factors, and also if and only if for all integer t satisfying 1 ≤ t ≤ min{m, n}, the gcd of all t × t sub-determinants of one matrix equals that of the other matrix.

Proof. Let A ∈ M ^m,n (F[x]) be given. Assume that A 6= 0I.

1. Let k 1 ≥ 0 be the minimum degree of all non-zero elements of all matrices equivalent to A. Then there exists a monic polynomial f 1 (x) of degree k 1 such that f 1 (x) is an entry of a matrix B = (b ij (x)) m×n

equivalent to A. By interchange rows and columns, we can assume that f 1 (x) = b 11 (x).

We claim that f 1 |b ^1j for all j = 2 · · · , n. Suppose this is not true. Then for some j ≥ 2, f does not divide b 1j . Hence, by a long division, we can write b 1j (x) = a(x)f (x) + r(x) where deg(r) < deg(f ).

Subtracting the first column multiplied by a(x) from the jth column gives a new matrix equivalent to A and whose (1, j) entry is r(x). This contradicts the definition of k 1 . Hence, f 1 |b ^1j . Similarly, we can show that f 1 |b ^j1 for all j = 2, · · · , m.

Upon subtracting appropriate multiple of the first row/column from the jth row/column, we then obtain a matrix which is equivalent to A and has the form

B = f 1 (x) 0

0 C

.

(20)

We claim that f 1 divides every entry of C. Suppose not. Then there is a non-zero entry c ij (x) of C such that f 1 does not divides c ij . Adding the (i + 1)th row of B to the first row, we obtain a matrix which is equivalent to A and whose first row contains entries f 1 and c ij (x). As in Step 1, we see that this is impossible. Hence, f 1 divides every entries of C.

Thus, f 1 (x) is the gcd of all 1 × 1 sub-determinants of B, and hence also the gcd of all 1 × 1 sub-determinants of A.

Now by induction, C is equivalent to diag(f 2 (x), · · · , f r (x), 0) where f 2 |f ² · · · |f r . Since f 1 divides every entries of C, and therefore the gcd of all 1 × 1 sub-determinants of C, which is f ² (x). Hence, f 1 |f ² · · · |f ^r , and A is equivalent to

S = diag(f 1 (x), · · · , f r (x), 0).

Since the gcd of all t × t sub-determinants of S is Π ^t i=1 f i (x), we see that f t , · · · , f r are unique. This proves the theorem.

Remark 5.1. A complete proof of Lemmas 5.5, 5.6, and Theorem 5.8 goes as follows.

1. Let S be the set of all those that can be written as the product of elementary matrices. Since every elementary matrix has an inverse which is also elementary (cf exercise ), every matrix in S has an inverse in S. In addition, every element in S has a non-zero scalar determinant.

2. Define equivalence over F[x] by B ∼ A if B = P AQ for some P, Q ∈ S. Following part of the proof of Theorem 5.8, one can show that for every A ∈ M m,n (F[x]), there are P, Q ∈ S such that P AQ = diag(f 1 , · · · , f ^r , 0), where f 1 , · · · , f ^r are monic polynomials.

3. Suppose A ∈ M ⁿ (F[x]) has a non-zero scalar determinant. Let P, Q ∈ S such that P AQ = diag(f 1 , · · · , f ⁿ ). Since f 1 · · · f ⁿ = det(P )det(A)det(Q) is a non-zero scalar, f 1 = · · · = f ⁿ = 1; that is, P AQ = I. Hence, A = P ⁻ ¹ Q ⁻ ¹ ∈ S. Thus, A ∈ S if and only if det(A) is a non-zero scalar.

4. Suppose A ∈ M n (F[x]) is invertible, i.e., there exists B such that AB = BA = I. Then 1 = det(A)det(B), so that det(A) is a non-zero scalar. Thus, A ∈ M ⁿ (F) is invertible if and only if det(A) is a non-zero scalar, and if and only if A is the product of elementary matrices.

Since every entries of a matrix A in M m,n (F[x]) is a polynomial, we can express it as A = A k x ^k + · · · + A ¹ x + A 0 x ⁰ = x ^k A k + · · · + x ⁰ A 0 , A j ∈ M ^m,n (F) ∀j.

If A m 6= 0, we say the degree of A is m, writing as deg(A) = m. Also we call A m the leading coefficient matrix.

One can verify that if B = x ^l B l + · · · + x ⁰ B 0 , then, as long as sizes match,

AB =

k+l

X

t=0

x ^t C t =

k+l

X

t=0

C t x ^t , C t := X

i+j=t

A i B j ,

BA =

k+l

X

t=0

x ^t D t =

k+l

X

t=0

D t x ^t , D t := X

i+j=t

B j A i .

A square matrix A ∈ M ⁿ (F[x]) is called a proper matrix polynomial if its leading coefficient matrix is invertible. The following lemma shows that a proper matrix polynomial can be used as a divisor.

Lemma 5.7 (Division Algorithm). Let B, D ∈ M n (F[x]) and D be proper.

(21)

5.6. EQUIVALENCE OVER F[X] 163 Then the division of B on the right by D has a unique right quotient Q r and a unique right remainder R r ; namely,

B = Q r D + R r , either R r = 0 or deg(R r ) < deg(D).

Similarly, the division of B on the left by D has a unique left quotient Q l and left remainder R l

such that

B = DQ l + R l , either R l = 0 or deg(R l ) < deg(D).

In particular, if D = xI − A and B = P m

i=0 x ⁱ B i where A, B 0 , · · · , B ^m ∈ M ⁿ (F), then R r = B r (A) :=

m

X

i=0

B i A ⁱ , R l = B l (A) :=

m

X

i=0

A ⁱ B i .

The proof follows from a long division. Here the condition that D is proper is important.

An important application of the division algorithm is the following:

Corollary 5.2. Assume that A 1 , A 2 , B 1 , B 2 ∈ M ⁿ (F) and A 2 is invertible. Define matrix polynomials M 1 , M 2 ∈ M ⁿ (F[x]) by

M 1 = A 1 x + B 1 , M 2 = A 2 x + B 2 .

Then M 1 is equivalent to M 2 over F[x] if and only if A 1 is invertible and M 1 and M 2 are equivalent over F, i.e., there exist invertible matrices P 0 , Q 0 ∈ M ⁿ (F) such that

M 2 = P 0 M 1 Q 0

or A 2 = P 0 A 1 Q 0 , B 2 = P 0 M 1 Q 0

.

Proof. We need only show that the only if part. For this, we assume that M 2 is equivalent to M 1

over F[x]. Then there exist invertible P, Q ∈ M n (F[x]) such that M 2 = P M 1 Q.

Since A 2 is invertible, M 2 is proper, so we can divide P on the left by M 2 and divide Q on the right by M 2 to obtain quotients and remainders as follows:

P = M 2 P + P ˆ 0 , Q = ˆ QM 2 + Q 0 .

Since the degree of M 2 is one, P 0 and Q 0 are matrices in M n (F). We can calculate M 2 = P M 1 Q = (M 2 P + P ˆ 0 )M 1 ( ˆ QM 2 + Q 0 )

= P 0 M 1 Q 0 + P 0 M 1 QM ˆ 2 + M 2 P M ˆ 1 Q 0 + M 2 P M ˆ 1 QM ˆ 2

= P 0 M 1 Q 0 + (P − M ² P )M ˆ 1 QM ˆ 2 + M 2 P M ˆ 1 (Q − ˆ QM 2 ) + M 2 P M ˆ 1 QM ˆ 2

= P 0 M 1 Q 0 + P M 1 QM ˆ 2 + M 2 P M ˆ 1 Q − M ² P M ˆ 1 QM ˆ 2

= P 0 M 1 Q 0 + M 2 Q ⁻ ¹ QM ˆ 2 + M 2 P P ˆ ⁻ ¹ M 2 − M ² P M ˆ 1 QM ˆ 2 ,

where we have used P 0 = P − M ² P , Q ˆ 0 = Q − ˆ QM 2 in the third equation and P M 1 = M 2 Q ⁻ ¹ , M 1 Q = P ⁻ ¹ M 2 in the last equation. Hence,

M 2 − P ⁰ M 1 Q 0 = M 2 DM 2 , D := Q ⁻ ¹ Q + ˆ ˆ P P ⁻ ¹ − ˆ P M 1 Q. ˆ

We claim that D = 0I. Indeed, if D 6= 0I, then letting D ^m 6= 0I be the leading coefficients of D with m = deg(D) ≥ 0, the degree of M ² DM 2 is m + 2 since A 2 invertible implies that the leading coefficient matrix A 2 D m A 2 of M 2 DM 2 is non-zero. But this is impossible since the degree of M 2 − P ⁰ M 1 Q 0 is at most one. Hence, we must have D = 0I.

Thus, M 2 = P 0 M 1 Q 0 . This equation also implies that A 2 = P 0 A 1 Q 0 and B 2 = P 0 B 1 Q 0 . Since A 2

is invertible, P 0 , Q 0 and A 1 are all invertible. This completes the proof.

(22)

Finally we provide an application of matrices over C[x] to systems of ordinary differential equations with constant coefficients.

Consider a system of 2 ordinary differential equations of 2 unknowns y 1 = y 1 (t), y 2 = y 2 (t):

y ₁ ⁰⁰ − y ⁰ 1 + y ₂ ⁰ = 0, y ₁ ⁰ − y ¹ + y ⁰ ₂ + y 2 = 0.

Introduce a symbol

D = ⁰ = d dt . Then this system can be written as

Ay = 0, A = A(D) := D ² − D D D − 1 D + 1

, y = y(t) := y 1 (t) y 2 (t)

. (5.2)

By elementary row operations, this system is equivalent to

D ² y 2 = 0, (D + 1)y 1 + (D − 1)y ² = 0.

Hence, the general solution can be written as

y 2 = c 1 + c 2 t, y 1 = c 1 + 2c 2 + c 2 t + c 3 e ^x , where c 1 , c 2 , c 3 are arbitrary constants.

The general homogeneous system of n linear ordinary differential equations of constant coefficients with n unknowns can be written as

Ay = 0, A = A(D) = a ⁱ _j (D)

n×n , a ⁱ _j (D) =

m(i,j)

X

k=1

b ⁱ _jk D ^k , y = (y ⁱ ) n×1

where b ⁱ _jk are constants. The integer max {m(i, j)} is called the order of the system.

By the theorem, A is equivalent to its Smith’s canonical form:

S = P AQ

where P, Q are invertible, and S = diag(f 1 (D), · · · , f ^r (D), 0, · · · , 0) with f ¹ |f ² · · · |f ^r . Hence, setting z = Q ⁻ ¹ (D)y, or y = Q(D)z, the system then is equivalent to a system of decoupled equations

f 1 (D)z 1 = 0, · · · , f r (D)z r = 0. (5.3) Suppose, for example, f 1 (D) = (D − 1) ³ (D ² + 1)(D − 2), then the general solution of z ¹ is

z 1 (t) = c 1 e ^2t + c 2 sin t + c 3 cos t + (c 4 + c 5 t + c 6 t ² )e ^t .

Another important application of matrix polynomial will be presented in the next section.

Exercise 33. Show that every elementary matrix has an inverse which is also elementary.

Also show that each elementary matrix obtained by performing a single elementary row operation on I can be obtained by performing a single elementary column operation on I, and vice versa.

Exercise 34. Prove Lemma 5.7.

(23)

5.6. EQUIVALENCE OVER F[X] 165 Exercise 35. Find the inverse of the matrices

x x + 1 x + 2 x + 3

, 0 1

−1 0

+ x 0 1 1 0

+ x ² 1 0 0 1

Also, write these matrices as the product of elementary matrices.

Exercise 36. Find the Smith’s canonical form for the following matrices:

x x ² x ³ x ⁵

,

x x

x ² + x x

Exercise 37. Show that M 1 and M 2 in Corollary 5.2 are equivalent if and only if xI + A ⁻ ₁ ¹ B 1 and xI + A ⁻ ₂ ¹ B 2 have the same invariant factors.

Exercise 38. If A is a square matrix over F[x], show that A is equivalent to its transpose.

Exercise 39. For the matrix A in (5.2), find invertible P, Q ∈ M ⁿ (R[x]) such that P AQ is in the Smith

canonical form. Also, write the corresponding decoupled system as in (5.3).

(24)

5.7 Similarity

Definition 5.10. Two square matrices A, B ∈ M ⁿ (F) are called similar if B = P ⁻ ¹ AP for some invertible P ∈ M ⁿ (F).

Definition 5.11. Let A ∈ M ⁿ (F) be a square matrix.

1. The matrix xI − A ∈ M ⁿ (F[x]) is called the characteristic matrix of A. The characteristic polynomial of A is

p A (x) = det(xI − A).

2. If f ∈ F[x] satisfies f(A) = 0I, we say f annihilates A. Among all monic polynomials that annihilate A, the one with the minimum degree is unique and, denoted by m A (x), is called the minimum polynomial of A.

3. Let f 1 , · · · , f ⁿ be the unique monic polynomials such that f 1 |f ² | · · · |f ⁿ and xI − A is equivalent over F[x] to diag(f 1 , · · · , f ⁿ ). Then {f ¹ , · · · , f ⁿ } are called the similarity invariants of A. Those which are not constant are called non-trivial.

4. Suppose f 1 , · · · , f ⁿ are similarity invariants of A and

f j (x) = p 1 (x) ^c

^j1

· · · p ^m (x) ^c

^jm

, j = 1, · · · , n.

where c ji ≥ 0 are integers, p ¹ (x), · · · , p ^m (x) are distinct, monic, irreducible polynomials . Then each non-constant p ^c _i

^ji

(x) is called an elementary divisor, and all elements in the list, counting duplications,

{p ⁱ (x) ^c

^ji

; c ji ≥ 1}

are called the elementary divisors of A over F.

We know that similarity is an RST relation. Also,

f 1 (x) · · · f ⁿ (x) = det(xI − A) = d ¹ (x) · · · d ^m (x),

where {f ¹ , · · · , f n } are all similarity invariants, and {d ¹ , · · · , d m } are all elementary divisors. All trivial similarity invariants of A are equal to unity.

Also, elementary divisors and similarity invariants can be found from one to the other.

Example 5.4. Suppose a 12 × 12 matrix A has the following similarity invariants

f 1 (x) = · · · = f ⁹ (x) = 1, f 10 (x) = x ² + 1, f 11 (x) = (x ² + 1)(x ² − 2), f ¹² (x) = (x ² + 1)(x ² − 2) ² . Then all elementary divisors of A over Q are

x ² + 1, x ² + 1, x ² + 1, x ² − 2, (x ² − 2) ² . All the elementary divisors of A over C are

x + i, x + i, x + i, x − i, x − i, x − i, x − √

2, x + √

2, (x − √

2) ² , (x + √ 2) ² .

The following theorem shows that similarity invariants and/or elementary divisors are exactly the

quantities that are not changed under similarity transformations.

(25)

5.7. SIMILARITY 167 Theorem 5.9. Let A, B ∈ M n (F) be two square matrices over F. The following statements are equiva- lent:

1. B is similar to A over F, i.e., there exist invertible P ∈ M n (F) such that B = P ⁻ ¹ AP ; 2. xI − B and xI − A are equivalent over F;

3. xI − B and xI − A are equivalent over F[x]; i.e, there exist invertible P, Q ∈ M ⁿ (F[x]) such that xI − B = P (xI − A) Q ;

4. Both A and B have the same similarity invariants;

5. Both A and B have the same elementary divisors over F;

6. For each i = 1, · · · , n, the gcd of all i × i sub-determinants of xI − B equals that of xI − A.

7. A is similar to B over K where K is a field containing F.

Proof. (1) ⇒(2). If B = P ⁻ ¹ AP , then xI − B = P ⁻ ¹ (xI − A)P so that xI − B is equivalent to xI − A over F.

(2) ⇒ (3) is trivial, since invertible matrices in M n (F) are also invertible matrices in M n (F[x]).

(3) ⇒ (1). Suppose xI − B is equivalent to xI − A over F[x]. Then by Corollary 5.2, xI − B is equivalent to xI − A over F, i.e. there exist invertible P ⁰ , Q 0 ∈ M n (F) such that xI − A = P ⁰ (xI − B)Q ⁰ . This implies that I = P 0 Q 0 and A = P 0 BQ 0 , i.e. B = P ₀ ⁻ ¹ AP 0 . Hence, B is similar to A.

Finally, the equivalence of (3),(4),(5),(6),(7) follows from Theorem 5.8 and the definitions of simi- larity invariants and elementary divisors, and Theorem 5.7.

Note that if a non-trivial polynomial f annihilates A, f (A) = 0I, then so is the gcd[m A , f ] since it is a linear combination of m A and f . As m A has minimum degree, we must have m A = gcd[m A , f ], i.e.

m A |f. Hence, the minimum polynomial of A is the gcd of all non-trivial polynomials that annihilate A.

Theorem 5.10. The minimum polynomial m A (x) of a square matrix A ∈ M ⁿ (F) is that similarity invariant of A that has the highest degree.

That is, if f 1 , · · · , f ⁿ are invariant factors of xI − A over F[x] where f ¹ |f ² | · · · |f ⁿ , then f n (x) is the minimum polynomial; in particular,

m A (x) = p A (x) g n−1 (x)

where p A (x) = det(xI − A) and g ⁿ⁻¹ (x) is the gcd of all (n − 1) × (n − 1) sub-determinants of (xI − A).

Proof. Denote by Q = Q(x) the adjoint of xI − A. Then (xI − A)Q = p ^A (x)I.

Since the entries of Q consist of all (n − 1) × (n − 1) sub-determinants of xI − A, and g ⁿ⁻¹ (x) is the gcd of all (n − 1) × (n − 1) sub-determinants of xI − A, we see that g ⁿ⁻¹ divides every entries of Q, so that we can write Q = g n−1 Q, and the gcd of all entries of ˆ ˆ Q is 1.

Hence,

g n−1 (xI − A) ˆ Q = p A I = g n−1 f n I.

Therefore,

f n I = (xI − A) ˆ Q

Theory of Matrices. Chapter 5

Chapter 5

Theory of Matrices

As before, F is a field. We use F[x] to represent the set of all polynomials of x with coefficients in F.

We use M m,n (F) and M m,n (F[x]) to denoted the set of m by n matrices with entries in F and F[x]

respectively. When m = n, we write M m,n as M n .

In this chapter we shall study seven RST (Reflective, Symmetric, Transitive) relations among ma- trices over F:

1. Equivalence over F: B = P AQ, where P and Q are invertible matrices over F;

2. Equivalence over F[x]: B = P AQ, where P and Q are invertible matrices over F[x];

3. Congruence: B = P T AP , where P is invertible;

4. Hermitian congruence: B = P ∗ AP , where F = C and P ∗ := ¯ P T and is invertible;

5. Similarity: B = P − 1 AP ;

6. Orthogonal similarity: B = P − 1 AP , where F = R and P − 1 = P T ; 7. Unitary similarity: B = P − 1 AP , where F = C and P − 1 = P ∗ .

All of the relations except the first two require the related matrices A and B to be square. Otherwise the products defining the relations are impossible.

The seven relations are in three categories: equivalent, congruent, and similar. For each of the seven relations there are two major problems:

(1) Obtaining a set of canonical matrices;

(2) Finding a necessary and sufficient condition for two matrices to obey the relation.

Equivalence over F originates from solving system of linear equations, by making change of variables and performing operations on equations. Equivalence over F[x] is primarily a tool for the study of similarity and the system of linear constant coefficient ordinary differential equations.

Similarity occurs in the determination of all matrices representing a common linear transformation, or alternatively, in finding basis such that a linear transformation has a simple form.

Congruence originates from non-singular linear substitution in quadratic forms and Hermitian forms.

It can be used to characterize quadratic surfaces.

Finally, if the linear substitution is required to preserve the length, congruence then becomes or- thogonal or unitary similarity.

143

5.1 RST Relation

Definition 5.1. Let S be a set. A relation on S is a collection R of ordered pairs from S × S. We say x relates y and write x ∼ y if (x, y) ∈ R. We say x does not relate y and write x 6∼ y if (x, y) 6∈ R.

A relation is called reflexive if x ∼ x for all x ∈ S;

A relation is called symmetric if x ∼ y implies y ∼ x;

A relation is called transitive if x ∼ y and y ∼ z implies x ∼ z;

An RST relation 1 is a reflexive, symmetric, and transitive relation.

Given an RST relation ∼ on S, for each x ∈ S, the set [[x]] := {y ∈ S | y ∼ x} is called the equivalent class of x.

Gives an RST relation, there are two fundamental problems:

1. finding a set of canonical forms, i.e., a set containing one and exactly one representative from each equivalent class.

2. finding convenient criteria to determine whether any given two elements are related or to find the canonical form which an arbitrarily given element relates.

Example 5.1. Consider Z, the set of all integers. We define a “mod(3)” relation by m ∼ n if m − n is an integer muptiple of 3.

This is an RST relation which divides Z is into three equivalent classes:

Z 0 := {3k | k ∈ Z}, Z 1 := {1 + 3k | k ∈ Z}, Z 2 := {2 + 3k | k ∈ Z}.

From each class, we pick-up one and only one representative, say 0 from Z 0 , 1 from Z 1 , and −1 from Z 2 . Then we obtain a set C = {0, 1, −1} of canonical forms. Each integer relates exactly one canonical form.

Criteria: An integer of the form a 1 · · · a k (in dicimal) relates the remainder of (a 1 + · · · + a k )/3. For example: n = 8230609. Then the remainder of (8 + 2 + 3 + 6 + 9)/3 is 2, so n ∈ Z 2 .

Exercise 1. Let S = M n (F), the set of all square matrices of rank n and entries in F. Which kind of relations (reflexive, symmetric, transitive, and/or RST) are the following?

1. A ∼ B if and only if B = 0I; e.g. R = {(A, 0I) | A ∈ S};

2. A ∼ B if and only if A = 0I; e.g. R = {(0I, B) | B ∈ S};

3. A ∼ B if and only if A = B; e.g. R = {(A, A) | A ∈ S};

4. A ∼ B for all A and B; e.g. R = {(A, B) | A, B ∈ S};

5. A ∼ B if and only if both A and B are invertible;

6. A ∼ B if and only if A − B has at least one row of zeros;

7. A ∼ B if and only if AB = 0I.

This is commonly called “equivalence relation”, but in this chapter such terminology would be confusing.

5.1. RST RELATION 145 Exercise 2. Demonstrate by examples that the three relations, reflexive, symmetric, and transitive, are independent to each other; namely, find examples of relations that satisfy (i) non of them, (ii) only one of them, (iii) only two of them, and (iii) all three of them.

Exercise 3. Suppose ∼ is an RST relation on a set S. Show that for every x, y ∈ S, either [[x]] = [[y]] or

[[x]] ∩ [[y]] = ∅.

5.2 Equivalence over F

Consider a linear system

n

X

j=1

a i j x j = b i , i = 1, · · · , m (5.1)

of m equations and n unknowns x 1 , · · · , x n . Here a i j , b i are given numbers (scalars) in F. This system can be put in a matrix form

Ax = b, A = (a i j ) m×n , x = (x j ) n×1 , b = (b i ) m×1 . In this form, the system may be regarded as having only one unknown, the vector x.

An elementary operation for a system of equations is one of the following three types:

1. Interchange two equations;

2. multiplication of an equation by an invertible scalar;

3. Addition to an equation by a scalar multiple of a different equation.

A system of equations Ax = b can be recorded by an augmented matrix G = (A b).

The elementary operations of equations correspond to the elementary row operations (defined below) on the augmented matrix.

Definition 5.2. 1. An elementary row operation is one of the followings:

(a) O ij : Interchange the ith and jth rows;

(b) O i (d) : Multiplication of the ith row by an invertible scalar d;

(c) O ij (c): Addition to the ith row by a scalar c multiple of the jth row, i 6= j.

2. An elementary matrix is a matrix obtained by performing an elementary row operation on an identity matrix I. There are three types of elementary matrices: E ij , E i (d) (d invertible), and E ij (c), obtained by performing O ij , O i (d), and O ij (c) on the identity matrix I, respectively.

3. A matrix is called in a Row-Reduced-Echelon-Form 2 or RREF, if

(a) every nontrivial row has 1 as its first non-zero entry, called a leading one;

(b) any leading one in a lower row occurs only to the right of its counterpart above it;

(c) In any column containing a leading one, all entries above and below the leading one are zero;

(d) all rows consisting entirely zeros, if any, occurs at the bottom of the matrix.

4. The row space of a matrix is the space spanned by all the row vectors of the matrix.

If F is not a field, say F = Z, the definition of RREF should be revised.

5.2. EQUIVALENCE OVER F 147

5. Two matrices are called row equivalent if they have the same row spaces.

6. The row rank of a matrix is the dimension of the row space of the matrix.

Lemma 5.1. 1. Every elementary matrix has an inverse, which is also elementary.

3. Congruence: B = P ^T AP , where P is invertible;

4. Hermitian congruence: B = P ^∗ AP , where F = C and P ^∗ := ¯ P ^T and is invertible;

5. Similarity: B = P ⁻ ¹ AP ;

6. Orthogonal similarity: B = P ⁻ ¹ AP , where F = R and P ⁻ ¹ = P ^T ; 7. Unitary similarity: B = P ⁻ ¹ AP , where F = C and P ⁻ ¹ = P ^∗ .

An RST relation ¹ is a reflexive, symmetric, and transitive relation.

Criteria: An integer of the form a 1 · · · a ^k (in dicimal) relates the remainder of (a 1 + · · · + a ^k )/3. For example: n = 8230609. Then the remainder of (8 + 2 + 3 + 6 + 9)/3 is 2, so n ∈ Z ² .

a ⁱ _j x ^j = b ⁱ , i = 1, · · · , m (5.1)

of m equations and n unknowns x 1 , · · · , x ⁿ . Here a ⁱ _j , b ⁱ are given numbers (scalars) in F. This system can be put in a matrix form

Ax = b, A = (a ⁱ _j ) m×n , x = (x ^j ) n×1 , b = (b ⁱ ) m×1 . In this form, the system may be regarded as having only one unknown, the vector x.

(a) O ^ij : Interchange the ith and jth rows;

(b) O ⁱ (d) : Multiplication of the ith row by an invertible scalar d;

(c) O ^ij (c): Addition to the ith row by a scalar c multiple of the jth row, i 6= j.

2. An elementary matrix is a matrix obtained by performing an elementary row operation on an identity matrix I. There are three types of elementary matrices: E ij , E i (d) (d invertible), and E ij (c), obtained by performing O ^ij , O ⁱ (d), and O ^ij (c) on the identity matrix I, respectively.

3. A matrix is called in a Row-Reduced-Echelon-Form ² or RREF, if

3. The row space of a matrix B ∈ M ^r,n (F) is a subspace of that of A ∈ M ^m,n (F) if and only if B = P A for some P ∈ M ^r,m (F).

 a ¹

.. . a ^m

 b ¹

.. . b ^r

row space of A = span {a ¹ , · · · , a ^m };

row space of B = span {b ¹ , · · · , b ^r }.

Hence, the row space of B is a subspace of the row space of A if and only if there exist scalars p ⁱ _j , i = 1, · · · , r, j = 1, · · · , m such that

b ⁱ =

p ⁱ _j a ^j ∀ i = 1, · · · , r;

namely, B = P A where P = (p ⁱ _j ) r×m .

2. Two matrices A, B ∈ M ^m,n (F) are row equivalent if and only if B = P A for some invertible P ∈ M ^m (F); that is, A and B have the same row space.

same succession of row operation performed on the identity matrix produces A ⁻ ¹ : (A I) −→ (I A ⁻ ¹ ).

Proof of 4. Suppose {O ¹ , · · · , O ^k } is a succession of elementary row operations that transform A to I, then I = E k . . . E 1 A where each E i is the matrix obtained by performing O ⁱ on the identity matrix I.

Hence, A ⁻ ¹ = E k · · · E ¹ = E k · · · E ¹ I; that is A ⁻ ¹ can be obtained by performing successively the row operations O ¹ , · · · , O ^k on I.

(iii) The column space of a matrix B ∈ M ^m,r (F) is a subspace of that of a matrix A ∈ M ^m,n (F) if and only if B = AQ for some Q ∈ M n,r (F).

(v) If a succession of elementary column operations transforms A to I, then the same succession of column operations transforms I to A ⁻ ¹ :

A I

I A ⁻ ¹

Note that if A, B ∈ M ^m,n (F), then P ∈ M ^m (F) and Q ∈ M ⁿ (F).

Theorem 5.1. 1. Every matrix A ∈ M ^m,n (F) is equivalent to a matrix in which for some integer r ≥ 0, the first r elements on the diagonal are 1 and all other elements of the matrix are 0, namely,

A ∼ I r 0 0 0

, where I r is the identity matrix of rank r.

4. Let A ∈ M ^m,n (F). Suppose a succession of elementary operations on the first m rows and n columns gives

A I m

−→ D P