Linear algebra - Filtering and System Identification

After studying this chapter you will be able to

• apply basic operations to vectors and matrices;

• define a vector space;

• define a subspace of a vector space;

• compute the rank of a matrix;

• list the four fundamental subspaces defined by a linear trans-formation;

• compute the inverse, determinant, eigenvalues, and eigenvec-tors of a square matrix.

• describe what positive-definite matrices are;

• compute some important matrix decompositions, such as the eigenvalue decomposition, the singular-value decomposition and the QR factorization;

• solve linear equations using techniques from linear algebra;

• describe the deterministic least-squares problem; and

• solve the deterministic least-squares problem in numerically sound ways.

2.1 Introduction

In this chapter we review some basic topics from linear algebra. The material presented is frequently used in the subsequent chapters.

Since the 1960s linear algebra has gained a prominent role in engineer-ing as a contributengineer-ing factor to the success of technological breakthroughs.

2.2 Vectors 9 Linear algebra provides tools for numerically solving system-theoretic problems, such as filtering and control problems. The widespread use of linear algebra tools in engineering has in its turn stimulated the devel-opment of the field of linear algebra, especially the numerical analysis of algorithms. A boost to the prominent role of linear algebra in engi-neering has certainly been provided by the introduction and widespread use of computer-aided-design packages such as Matlab (MathWorks, 2000b) and SciLab (Gomez, 1999). The user-friendliness of these pack-ages allow us to program solutions for complex system-theoretic prob-lems in just a few lines of code. Thus the prototyping of new algorithms is greatly speeded-up. However, on the other hand, there is also need for a word of caution: The coding in Matlab may give the user the impression that one successful Matlab run is equivalent to a full proof of a new theory. In order to avoid the cultivation of such a “proven-by-Matlab” attitude, the refreshment survey in this chapter and the use of linear algebra in later chapters concern primarily the derivation of the algorithms rather than their use. The use of Matlab routines for the class of filtering and identification problems analyzed in this book is described in detail in the comprehensive software guide (Verhaegen et al., 2003).

We start this chapter with a review of two basic elements of linear algebra: vectors and matrices. Vectors are described in Section 2.2, matrices in Section 2.3. For a special class of matrices, square matrices, several important concepts exist, and these are described in Section 2.4. Section 2.5 describes some matrix decompositions that have proven to be useful in the context of filtering and estimation. Finally, in Sections 2.6 and 2.7 we focus on least-squares problems in which an overdetermined set of linear equations needs to be solved. These prob-lems are of particular interest, since a lot of filtering, estimation, and even control problems can be written as linear (weighted) least-squares problems.

2.2 Vectors

A vector is an array of real or complex numbers. Throughout this book we use R to denote the set of real numbers and C to denote the set of complex numbers. Vectors come in two flavors, column vectors and row vectors. The column vector that consists of the elements x1, x2, . . ., xn

with xi∈ C will be denoted by x, that is,

In this book a vector denoted by a lower-case character will always be a column vector. Row vectors are denoted by x^T, that is,

x^T=

x1 x2 · · · xn

The row vector x^Tis also called the transpose of the column vector x.

The number of elements in a vector is called the dimension of the vector.

A vector having n elements is referred to as an n-dimensional vector.

We use the notation x ∈ Cⁿ to denote an n-dimensional vector that has complex-valued elements. Obviously, an n-dimensional vector with real-valued elements is denoted by x ∈ Rⁿ. In this book, most vectors will be real-valued; therefore, in the remaining part of this chapter we will restrict ourselves to real-valued vectors. However, most results can readily be extended to complex-valued vectors.

The multiplication of a vector x ∈ Rⁿ by a scalar α ∈ R is defined as

The standard inner product of two vectors x, y ∈ Rⁿ is equal to x^Ty = x1y1+ x2y2+ · · · + xnyn.

The 2 -norm of a vector x, denoted by ||x||2, is the square root of the inner product of this vector with itself, that is,

||x||2=√ x^Tx.

2.2 Vectors 11 Two vectors are called orthogonal if their inner product equals zero; they are called orthonormal if they are orthogonal and have unit 2-norms.

Any two vectors x, y ∈ Rⁿ satisfy the Cauchy–Schwartz inequality

|x^Ty| ≤ ||x||2||y||2,

where equality holds if and only if x = αy for some scalar α ∈ R, or y = 0.

The m vectors xi∈ Rⁿ, i = 1, 2, . . ., m are linearly independent if any linear combination of these vectors,

α1x1+ α2x2+ · · · + α^mxm, (2.1) with αi∈ R, i = 1, 2, . . ., m is zero only if αⁱ= 0, for all i = 1, 2, . . ., m. If (2.1) equals zero for some of the coefficients αidifferent from zero, then the vectors xi, for i = 1, 2, . . ., m, are linearly dependent or collinear.

In the latter case, at least one of them, say xm, can be expressed as a linear combination of the others, that is,

xm= β1x1+ β2x2+ · · · + βm−1xm−1,

for some scalars βi∈ R, i = 1, 2, . . ., m − 1. Note that a set of m vectors in Rⁿ must be linearly dependent if m > n. Let us illustrate linear independence with an example.

Example 2.1 (Linear independence) Consider the vectors

x1=



 1 1 1



, x2=



 0 1 0



, x3=



 2 3 2



.

The vectors x1 and x2are linearly independent, because α1x1+ α2x2= 0

implies α1= 0 and α1+ α2= 0 and thus α1= α2= 0. The vectors x1, x2, and x3 are linearly dependent, because x3= 2x1+ x2.

A vector space V is a set of vectors, together with rules for vector addition and multiplication by real numbers, that has the following properties:

(i) x + y ∈ V for all x, y ∈ V;

(ii) αx ∈ V for all α ∈ R and x ∈ V;

(iii) x + y = y + x for all x, y ∈ V;

(iv) x + (y + z) = (x + y) + z for all x, y, z ∈ V;

(v) x + 0 = x for all x ∈ V;

For example, the collection of all n-dimensional vectors, Rⁿ, forms a vector space. If every vector in a vector space V can be expressed as a linear combination of some vectors xi∈ V, i = 1, 2, . . ., ℓ, these vectors xi span the space V. In other words, every vector y ∈ V can be written as that span the space and are linearly independent. An orthogonal basis is a basis in which every vector is orthogonal to all the other vectors contained in the basis. The number of vectors in a basis is referred to as the dimension of the space. Therefore, to span an n-dimensional vector space, we need at least n vectors.

A subspace U of a vector space V, denoted by U ⊂ V, is a nonempty subset that satisfies two requirements:

(i) x + y ∈ U for all x, y ∈ U; and (ii) αx ∈ U for all α ∈ R and x ∈ U.

In other words, adding two vectors or multiplying a vector by a scalar produces again a vector that lies in the subspace. By taking α = 0 in the second rule, it follows that the zero vector belongs to every subspace.

Every subspace is by itself a vector space, because the rules for vector addition and scalar multiplication inherit the properties of the host vec-tor space V. Two subspaces U and W of the same space V are said to be orthogonal if every vector in U is orthogonal to every vector in W. Given a subspace U of V, the space of all vectors in V that are orthogonal to U is called the orthogonal complement of U, denoted by U^⊥.

Example 2.2 (Vector spaces) The vectors

x1=

2.3 Matrices 13 are linearly independent, and form a basis for the vector space R³. They do not form an orthogonal basis, since x2 is not orthogonal to x3 The vector x1spans a one-dimensional subspace of R³or a line. This subspace is the orthogonal complement of the two-dimensional subspace of R³, or a plane, spanned by x2and x3, since x1is orthogonal both to x2and to x3.

2.3 Matrices

A vector is an array that has either one column or one row. An array that has m rows and n columns is called an m- by n-dimensional matrix, or, briefly, an m × n matrix. An m × n matrix is denoted by

The scalar aij is referred to as the (i, j)th entry or element of the matrix.

In this book, matrices will be denoted by upper-case letters. We use the notation A ∈ C^m×n to denote an m × n matrix with complex-valued entries and A ∈ R^m×n to denote an m × n matrix with real-valued entries. Most matrices that we encounter will be real-valued, so in this chapter we will mainly discuss real-valued matrices.

An n-dimensional column vector can, of course, be viewed as an n × 1 matrix and an n-dimensional row vector can be viewed as a 1×n matrix.

A matrix that has the same number of columns as rows is called a square matrix . Square matrices have some special properties, which are described in Section 2.4.

A matrix A ∈ R^m×ncan be viewed as a collection of n column vectors of dimension m,

or as a collection of m row vectors of dimension n,

A =

This shows that we can partition a matrix into column vectors or row vectors. A more general partitioning is a partitioning into submatrices, for example

A =

A11 A12

A21 A22

∈ R^m×n,

with matrices A11∈ R^p×q, A12∈ R^p×(n−q), A21∈ R^(m−p)×q, and A22∈ R(m−p)×(n−q)for certain p < m and q < n.

The transpose of a matrix A, denoted by A^T, is the n × m matrix that is obtained by interchanging the rows and columns of A, that is,

A^T=







a11 a21 · · · am1

a12 a22 · · · am2

... ... . .. ... a1n a2n · · · anm





.

It immediately follows that (A^T)^T= A.

A special matrix is the identity matrix , denoted by I. This is a square matrix that has only nonzero entries along its diagonal, and these entries are all equal to unity, that is,

I =







1 0 · · · 0 0 0 1 · · · 0 0 ... ... . .. ... ...

0 0 · · · 1 0 0 0 · · · 0 1







. (2.2)

If we multiply the matrix A ∈ R^m×nby a scalar α ∈ R, we get a matrix with the (i, j)th entry given by αaij (for i = 1, . . ., m, j = 1, . . ., n). The sum of two matrices A and B of equal dimensions yields a matrix of the same dimensions with the (i, j)th entry given by aij + bij. Obviously, (A + B)^T = A^T+ B^T. The product of the matrices A ∈ R^m×n and B ∈ R^n×pis defined as an m × p matrix with the (i, j)th entry given by n

k=1aikbkj. It is important to note that in general we have AB = BA.

We also have (AB)^T= B^TA^T.

Another kind of matrix product is the Kronecker product. The Kro-necker product of two matrices A ∈ R^m×n and B ∈ R^p×q, denoted by

2.3 Matrices 15

The Kronecker product and the vec operator in combination are often useful for rewriting matrix products. The vec operator stacks all the columns of a matrix on top of each other in one big vector. For a matrix A ∈ R^m×n given as A =

a1 a2 · · · an

, with ai ∈ R^m, the mn-dimensional vector vec(A) is defined as

vec(A) = The 2-norm of the vector vec(A) can be viewed as a norm for the matrix A. This norm is called the Frobenius norm. The Frobenius norm of a matrix A ∈ R^m×n, denoted by ||A||F, is defined as

A matrix that has the property that A^TA = I is called an orthogonal matrix ; the column vectors ai of an orthogonal matrix are orthogonal vectors that in addition satisfy a^T_iai = 1. Hence, an orthogonal matrix has orthonormal columns!

The rank of the matrix A, denoted by rank(A), is defined as the number of linearly independent columns of A. It is easy to see that the number of linearly independent columns must equal the number of linearly independent rows. Hence, rank(A) = rank(A^T) and rank(A) ≤ min(m, n) if A is m ×n. A matrix A has full rank if rank(A) = min(m, n).

An important property of the rank of the product of two matrices is stated in the following lemma.

Lemma 2.1 (Sylvester’s inequality) (Kailath, 1980) Consider matri-ces A ∈ R^m×n and B ∈ R^n×p, then

rank(A) + rank(B) − n ≤ rank(AB) ≤ min

rank(A), rank(B) . This inequality is often used to determine the rank of the matrix AB when both A and B have full rank n with n ≤ p and n ≤ m. In this case Sylvester’s inequality becomes n + n − n ≤ rank(AB) ≤ min(n, n), and it follows that rank(AB) = n. The next example illustrates that it is not always possible to determine the rank of the matrix AB using Sylvester’s inequality.

Example 2.3 (Rank of the product of two matrices) This example demonstrates that, on taking the product of two matrices of rank n, the resulting matrix can have a rank less than n. Consider the following matrices:

A =

1 0 2 0 1 0

, B =



 1 3 1 1 1 0



.

Both matrices have rank two. Their product is given by AB =

3 3 1 1

This matrix is of rank one. For this example, Sylvester’s inequality becomes 2 + 2 − 3 ≤ rank(AB) ≤ 2, or 1 ≤ rank(AB) ≤ 2.

With the definition of the rank of a matrix we can now define the four fundamental subspaces related to a matrix. A matrix A ∈ R^m×n defines a linear transformation from the vector space Rⁿ to the vector space R^m. In these vector spaces four subspaces are defined. The vector space spanned by the columns of the matrix A is a subspace of R^m. It is called the column space of the matrix A and is denoted by range(A).

Let rank(A) = r, then the column space of A has dimension r and any selection of r linearly independent columns of A forms a basis for its column space. Similarly, the rows of A span a vector space. This subspace is called the row space of the matrix A and is a subspace of Rⁿ. The row space is denoted by range(A^T). Any selection of r linearly independent rows of A forms a basis for its row space. Two other important subspaces associated with the matrix A are the null space or kernel and the left null space. The null space, denoted by ker(A), consists of all vectors x ∈ Rⁿ that satisfy Ax = 0. The left null space, denoted by ker(A^T),

2.3 Matrices 17 consists of all vectors y ∈ R^m that satisfy A^Ty = 0. These results are summarized in the box below.

The four fundamental subspaces of a matrix A ∈ R^m×n are range(A) = {y ∈ R^m: y = Ax for some x ∈ Rⁿ}, range(A^T) = {x ∈ Rⁿ : x = A^Ty for some y ∈ R^m},

ker(A) = {x ∈ Rⁿ : Ax = 0}, ker(A^T) = {y ∈ R^m: A^Ty = 0}.

We have the following important relation between these subspaces.

Theorem 2.1 (Strang, 1988) Given a matrix A ∈ R^m×n, the null space of A is the orthogonal complement of the row space of A in Rⁿ, and the left null space of A is the orthogonal complement of the column space of A in R^m.

From this we see that the null space of a matrix A ∈ R^m×nwith rank r has dimension n − r, and the left null space has dimension m − r.

Given a matrix A ∈ R^m×n and a vector x ∈ Rⁿ, we can write Ax = A(xr+ xn) = Axr= yr,

with

xr∈ range(A^T) ⊆ Rⁿ, xn∈ ker(A) ⊆ Rⁿ,

yr∈ range(A) ⊆ R^m, and also, for a vector y ∈ R^m,

A^Ty = A^T(yr+ yn) = A^Tyr= xr, with

yr∈ range(A) ⊆ R^m, yn ∈ ker(A^T) ⊆ R^m, xr∈ range(A^T) ⊆ Rⁿ.

Example 2.4 (Fundamental subspaces) Given the matrix

A =





1 1 2 1 0 1 2 1 0 1 2 1



.

It is easy to see that the first and second columns of this matrix are linearly independent. The third and fourth columns are scaled versions of the second column, and thus rank(A) = 2. The first two columns are a basis for the two-dimensional column space of A. This column space is a subspace of R³. The left null space of A is the orthogonal complement of the column space in R³. Hence, it is a one-dimensional space, a basis for this space is the vector

y =

because it is orthogonal to the columns of the matrix A. It is easy to see that indeed A^Ty = 0. The row space of A is spanned by the first two rows. This two-dimensional space is a subspace of R⁴. The two-dimensional null space is the orthogonal complement of this space. The null space can be spanned by the vectors

x1=

because they are linearly independent and are orthogonal to the rows of the matrix A. It is easy to see that indeed Ax1= 0 and Ax2= 0.

2.4 Square matrices

Square matrices have some special properties, of which some impor-tant ones are described in this section. If a square matrix A has full rank, there exists a unique matrix A⁻¹, called the inverse of A, such that

AA⁻¹= A⁻¹A = I.

A square matrix that has full rank is called an invertible or nonsingular matrix. We have (A^T)⁻¹= (A⁻¹)^Tand also A^T(A⁻¹)^T= (A⁻¹)^TA^T= I. If A is a square orthogonal matrix, A^TA = I, and thus A⁻¹ = A^T and also AA^T= I. If A and B are invertible matrices then (AB)⁻¹ = B⁻¹A⁻¹. An important formula for inverting matrices is given in the following lemma.

2.4 Square matrices 19 Lemma 2.2 (Matrix-inversion lemma) (Golub and Van Loan, 1996) Consider the matrices A ∈ R^n×n, B ∈ R^n×m, C ∈ R^m×m, and D ∈ R^m×n. If A, C, and A + BCD are invertible, then,

(A + BCD)⁻¹= A⁻¹− A⁻¹B(C⁻¹+ DA⁻¹B)⁻¹DA⁻¹. In Exercise 2.2 on page 38 you are asked to prove this lemma.

The rank of a square matrix can be related to the rank of its subma-trices, using the notion of Schur complements.

Lemma 2.3 (Schur complement) Let S ∈ R(n+m)×(n+m)be a matrix partitioned as

Proof Statement (i) is easily proven using the following identity and Sylvester’s inequality (Lemma 2.1):

In 0 Statement (ii) is easily proven from the following identity:

In −BD⁻¹

The determinant of a square n × n matrix, denoted by det(A), is a scalar that is defined recursively as

det(A) = n i=1

(−1)^i+jaijdet(Aij),

where Aij is the (n − 1) × (n − 1) matrix that is formed by deleting the ith row and the jth column of A. The determinant of a 1×1 matrix A (a scalar) is equal to the matrix itself, that is, det(A) = A. Some important

properties of the determinant are the following:

(i) det(A^T) = det(A) for all A ∈ R^n×n;

(ii) det(αA) = αⁿdet(A) for all A ∈ R^n×n and α ∈ R;

(iii) det(AB) = det(A)det(B) for all A, B ∈ R^n×n; (iv) det(A) = 0 if and only if A ∈ R^n×n is invertible; and

(v) det(A⁻¹) = 1/det(A) for all invertible matrices A ∈ R^n×n. The determinant and the inverse of a matrix are related as follows:

A⁻¹= adj(A)

det(A), (2.4)

where the adjugate of A, denoted by adj(A), is an n × n matrix with its (i, j)th entry given by (−1)^i+jdet(Aji) and Aji is the (n − 1) × (n − 1) matrix that is formed by deleting the jth row and ith column of A.

Example 2.5 (Determinant and inverse of a 2 × 2 matrix) Con-sider a matrix A ∈ R^2×2 given by

A =

a11 a12

a21 a22

. The determinant of this matrix is given by

det(A) = (−1)¹⁺¹a11det(a22) + (−1)²⁺¹a21det(a12) = a11a22− a21a12. The adjugate of A is given by

adj(A) =

(−1)²a22 (−1)³a12

(−1)³a21 (−1)⁴a11

. Hence, the inverse of A is

A⁻¹= adj(A)

det(A)= 1

a11a22− a21a12

a22 −a12

−a²¹ a11

. It is easily verified that indeed AA⁻¹= I.

An eigenvalue of a square matrix A ∈ R^n×n is a scalar λ ∈ C that satisfies

Ax = λx, (2.5)

for some nonzero vector x ∈ Cⁿ. All nonzero vectors x that satisfy (2.5) for some λ are called the eigenvectors of A corresponding to the eigen-value λ. An n × n matrix can have at most n different eigeneigen-values. Note that, even if the matrix A is real-valued, the eigenvalues and eigenvectors

2.4 Square matrices 21 can be complex-valued. Since (2.5) is equivalent to (λI − A)x = 0, the eigenvectors of A are in the null space of the polynomial matrix (λI −A).

To find the (nonzero) eigenvectors, λ should be such that the polynomial matrix (λI − A) is singular. This is equivalent to the condition

det(λI − A) = 0.

The left-hand side of this equation is a polynomial in λ of degree n and is often referred to as the characteristic polynomial of the matrix A. Hence, the eigenvalues of a matrix A are equal to the zeros of its characteristic polynomial. The eigenvalues of a full-rank matrix are all nonzero. This follows from the fact that a zero eigenvalue turns Equation (2.5) on page 20 into Ax = 0, which has no nonzero solution x if A has full rank. From this it also follows that a singular matrix A has rank(A) nonzero eigenvalues and n − rank(A) eigenvalues that are equal to zero.

With the eigenvalues of the matrix A, given as λi, i = 1, 2, . . ., n, we can express its determinant as

det(A) = λ1· λ2· · · λn=

n i=1

λi.

The trace of a square matrix A ∈ R^n×n, denoted by tr(A), is the sum of its diagonal entries, that is,

tr(A) = n i=1

aii.

Then, in terms of the eigenvalues of A,

tr(A) = λ1+ λ2+ · · · + λⁿ= n i=1

λi.

The characteristic polynomial det(λI −A) has the following important property, which is often used to write A^n+k for some integer k ≥ 0 as a linear combination of I, A, . . ., Aⁿ⁻¹.

Theorem 2.2 (Cayley–Hamilton) (Strang, 1988) If the characteristic polynomial of the matrix A ∈ R^n×n is given by

det(λI − A) = λⁿ+ an−1λⁿ⁻¹+ · · · + a1λ + a0

for some ai ∈ R, i = 0, 1, . . ., n − 1, then

Aⁿ+ an−1Aⁿ⁻¹+ · · · + a¹A + a0I = 0.

Example 2.6 (Eigenvalues and eigenvectors) Consider the matrix 4 −5

2 −3

We have det(A) = −2 and tr(A) = 1. The eigenvalues of this matrix are computed as the zeros of

det(λI − A) = det

λ − 4 5

−2 λ + 3

= λ²− λ − 2 = (λ + 1)(λ − 2).

Hence, the eigenvalues are −1 and 2. We see that indeed det(A) = −1 · 2 and tr(A) = −1 + 2. The eigenvector x corresponding to the eigenvalue of −1 is found by solving Ax = −x. Let

x = x1

. Then it follows that

4x1− 5x2= −x1, 2x1− 3x²= −x²

or, equivalently, x1= x2;, thus an eigenvector corresponding to the eigen-value −1 is given by

1 1

. Similarly, it can be found that

5 2

is an eigenvector corresponding to the eigenvalue 2.

An important property of the eigenvalues of a matrix is summarized in the following lemma.

Lemma 2.4 (Strang, 1988) Suppose that a matrix A ∈ R^n×n has a complex-valued eigenvalue λ, with corresponding eigenvector x. Then the complex conjugate of λ is also an eigenvalue of A and the complex con-jugate of x is a corresponding eigenvector.

An important question is the following: when are the eigenvectors of a matrix linearly independent? The next lemma provides a partial answer to this question.

2.4 Square matrices 23 Lemma 2.5 (Strang, 1988) Eigenvectors x1, x2, . . ., xk, corresponding to distinct eigenvalues λ1, λ2, . . ., λk, k ≤ n of a matrix A ∈ R^n×n are linearly independent.

Eigenvectors corresponding to repetitive eigenvalues can be linearly inde-pendent, but this is not necessary. This is illustrated in the following example.

Example 2.7 (Linearly dependent eigenvectors) Consider the matrix

3 1 0 3

The two eigenvalues of this matrix are both equal to 3. Solving Ax = 3x to determine the eigenvectors shows that all eigenvectors are multiples

1 0

Thus, the eigenvectors are linearly dependent.

Now we will look at some special square matrices. A diagonal matrix is a matrix in which nonzero entries occur only along its diagonal. An example of a diagonal matrix is the identity matrix, see Equation (2.2)

In document Filtering and System Identification (Page 25-59)