On Chebyshev polynomials of matrices

(1)

SIAM J. MATRIXANAL.APPL. c 2010 Society for Industrial and Applied Mathematics Vol. 31, No. 4, pp. 2205–2221

ON CHEBYSHEV POLYNOMIALS OF MATRICES

∗

VANCE FABER†, J ¨ORG LIESEN‡, AND PETR TICH ´Y§

Abstract. Themth Chebyshev polynomial of a square matrixAis the monic polynomial that minimizes the matrix 2-norm ofp(A) over all monic polynomialsp(z) of degreem. This polynomial is uniquely deﬁned ifmis less than the degree of the minimal polynomial ofA. We study general properties of Chebyshev polynomials of matrices, which in some cases turn out to be generalizations of well-known properties of Chebyshev polynomials of compact sets in the complex plane. We also derive explicit formulas of the Chebyshev polynomials of certain classes of matrices, and explore the relation between Chebyshev polynomials of one of these matrix classes and Chebyshev polynomials of lemniscatic regions in the complex plane.

Key words. matrix approximation problems, Chebyshev polynomials, complex approximation theory, Krylov subspace methods, Arnoldi’s method

AMS subject classifications.41A10, 15A60, 65F10

DOI.10.1137/090779486

1. Introduction.

Let

_A

∈

C

n×n

be a given matrix, let

_m

≥

1 be a given integer,

and let

M

_m

denote the set of complex

monic

polynomials of degree

_m

. We consider

the approximation problem

(1.1)

min

p∈Mm

p

(

A

)

,

where

· denotes the matrix 2-norm (or spectral norm). As shown by Greenbaum and

Trefethen [11, Theorem 2] (also cf. [13, Theorem 2.2]), problem (1.1) has a uniquely

deﬁned solution when

_m

is smaller than

_d

(

_A

), the degree of the minimal polynomial

of

_A

. This is a nontrivial result since the matrix 2-norm is not strictly convex, and

approximation problems in such norms are in general not guaranteed to have a unique

solution; see [13, pp. 853–854] for more details and an example. In this paper we

assume that

_{m < d}

(

_A

), which is necessary and suﬃcient so that the value of (1.1) is

positive, and we denote the unique solution of (1.1) by

_T

_mA

(

_z

). Note that if

_A

∈

R

n×n

,

then the Chebyshev polynomials of

_A

have real coeﬃcients, and hence in this case it

suﬃces to consider only real monic polynomials in (1.1).

It is clear that (1.1) is unitarily invariant, i.e., that

_T

_mA

(

_z

) =

_T

_mU∗AU

(

_z

) for any

unitary matrix

_U

∈

C

n×n

. In particular, if the matrix

_A

is normal, i.e., unitarily

diagonalizable, then

min

p∈Mm

p

(

A

)

=

p∈Mm

min

λ

max

∈Λ(A)

|

p

(

λ

)

|

,

∗_{Received by the editors December 8, 2009; accepted for publication (in revised form) by} M. H. Gutknecht April 5, 2010; published electronically June 24, 2010.

http://www.siam.org/journals/simax/31-4/77948.html †_{Cloudpak, Seattle, WA 98109 (vance.faber@gmail.com).}

‡_{Institute of Mathematics, Technical University of Berlin, Straße des 17. Juni 136, 10623 Berlin,} Germany (liesen@math.tu-berlin.de). The work of this author was supported by the Heisenberg Program of the Deutsche Forschungsgemeinschaft.

§_{Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vod´}_arenskou vˇeˇz´ı 2, 18207 Prague, Czech Republic (tichy@cs.cas.cz). The work of this author is a part of the Institutional Research Plan AV0Z10300504 and was supported by GAAS grant IAA100300802 and by project M100300901 of the institutional support of the Academy of Sciences of the Czech Republic.

2205

(2)

where Λ(

_A

) denotes the set of the eigenvalues of

_A

. The (uniquely deﬁned)

_m

th

degree monic polynomial that deviates least from zero on a compact set Ω in the

complex plane is called the

_m

th Chebyshev polynomial

1

of the set

Ω. We denote this

polynomial by

_T

_mΩ

(

_z

).

The last equation shows that for a normal matrix

_A

the matrix approximation

problem (1.1) is equal to the scalar approximation problem of ﬁnding

_T

_mΛ(A)

(

_z

), and

in fact

_T

_mA

(

_z

) =

_T

_mΛ(A)

(

_z

). Because of these relations, problem (1.1) can be considered

a generalization of a classical problem of mathematics from scalars to matrices. As a

consequence, Greenbaum and Trefethen [11] as well as Toh and Trefethen [24] have

called the solution

_T

_mA

(

_z

) of (1.1) the

_m

th Chebyshev polynomial of the matrix

_A

(regardless of

_A

being normal or not).

A motivation for studying problem (1.1) and the Chebyshev polynomials of

ma-trices comes from their connection to Krylov subspace methods, and in particular the

Arnoldi method for approximating eigenvalues of matrices [2]. In a nutshell, after

_m

steps of this method a relation of the form

_AV

_m

=

_V

_m

_H

_m

+

_r

_m

_e

T_m

is computed, where

H

m

∈

C

m×m

is an upper Hessenberg matrix,

r

m

∈

C

n

is the

m

th “residual” vector,

e

m

is the

_m

th canonical basis vector of

C

m

, and the columns of

_V

_m

∈

C

n×m

form an

orthonormal basis of the Krylov subspace

Km

(

_{A, v}

₁

) = span

{

_v

₁

_{, Av}

₁

_{, . . . , A}

m−1

_v

₁

}

.

The vector

_v

₁

∈

C

n

is an arbitrary (nonzero) initial vector. The eigenvalues of

_H

_m

are used as approximations for the eigenvalues of

_A

. Note that

_r

_m

= 0 if and only

if the columns of

_V

_m

span an invariant subspace of

_A

, and if this holds, then each

eigenvalue of

_H

_m

is an eigenvalue of

_A

.

As shown by Saad [15, Theorem 5.1], the characteristic polynomial

_ϕ

_m

of

_H

_m

satisﬁes

(1.2)

_ϕ

_m

(

_A

)

_v

₁

=

min

p∈Mm

p

(

A

)

v

1

.

An interpretation of this result is that the characteristic polynomial of

_H

_m

solves the

Chebyshev approximation problem for

_A

with respect to the given starting vector

_v

₁

.

Saad pointed out that (1.2) “seems to be the only known optimality property that

is satisﬁed by the [Arnoldi] approximation process in the nonsymmetric case” [16,

p. 171]. To learn more about this property, Greenbaum and Trefethen [11, p. 362]

suggested “to disentangle [the] matrix essence of the process from the distracting

eﬀects of the initial vector,” and hence study the “idealized” problem (1.1) instead of

(1.2). They referred to the solution of (1.1) as the

_m

th ideal Arnoldi polynomial of

_A

(in addition to the name

_m

th Chebyshev polynomial of

_A

).

Greenbaum and Trefethen [11] seem to be the ﬁrst who studied existence and

uniqueness of Chebyshev polynomials of matrices. Toh and Trefethen [24] derived an

algorithm for computing these polynomials based on semideﬁnite programming; see

also Toh’s Ph.D. thesis [21, Chapter 2]. This algorithm is now part of the SDPT3

Toolbox [23]. The paper [24] as well as [21] and [25, Chapter 29] give numerous

computed examples for the norms, roots, and coeﬃcients of Chebyshev polynomials

of matrices. It is shown numerically that the lemniscates of these polynomials tend

to approximate pseudospectra of

_A

. In addition, Toh has shown that the zeros of

T

_mA

(

_z

) are contained in the ﬁeld of values of

_A

[21, Theorem 5.10]. This result is

1_{Pafnuti Lvovich Chebyshev (1821–1894) determined the polynomials}_TΩ

m(z) of Ω = [−a, a] (a real interval symmetric to zero) in his 1859 paper [5], which laid the foundations of modern approximation theory. We recommend Steﬀens’ book [18] to readers who are interested in the historical development of the subject.

(3)

ON CHEBYSHEV POLYNOMIALS OF MATRICES

2207

part of his interesting analysis of Chebyshev polynomials of linear operators in inﬁnite

dimensional Hilbert spaces [21, Chapter 5]. The ﬁrst explicit solutions for the problem

(1.1) for a nonnormal matrix

_A

we are aware of have been given in [13, Theorem 3.4].

It is shown there that

_T

_mA

(

_z

) = (

_z

−

_λ

)

m

if

_A

=

_J

_λ

, a Jordan block with eigenvalue

λ

∈

C

. Note that in this case the Chebyshev polynomials of

_A

are independent of the

size of

_A

.

The above remarks show that problem (1.1) is a mathematically interesting

gen-eralization of the classical Chebyshev problem, which has an important application

in the area of iterative methods. Yet, our survey of the literature indicates that there

has been little theoretical work on Chebyshev polynomials of matrices (in particular

when compared with the substantial work on Chebyshev polynomials for compact

sets). The main motivation for writing this paper was to extend the existing

the-ory of Chebyshev polynomials of matrices. Therefore we considered a number of

known properties of Chebyshev polynomials of compact sets, and tried to ﬁnd matrix

analogues. Among these are the behavior of

_T

_mA

(

_z

) under shifts and scaling of

_A

,

a matrix analogue of the “alternation property,” as well as conditions on

_A

so that

T

mA

(

z

) is even or odd (section 2). We also give further explicit examples of Chebyshev

polynomials of some classes of matrices (section 3). For a class of block Toeplitz

ma-trices, we explore the relation between their Chebyshev polynomials and Chebyshev

polynomials of lemniscatic regions in the complex plane (section 4).

All computations in this paper have been performed using MATLAB [20]. For

computing Chebyshev polynomials of matrices we have used the DSDP software

pack-age for semideﬁnite programming [3] and its MATLAB interface.

2. General results.

In this section we state and prove results on the Chebyshev

polynomials of a general matrix

_A

. In later sections we will apply these results to

some speciﬁc examples.

2.1. Chebyshev polynomials of shifted and scaled matrices.

In the

fol-lowing we will write a complex (monic) polynomial of degree

_m

as a function of the

variable

_z

and its coeﬃcients. More precisely, for

_x

= [

_x

₀

_{, . . . , x}

_m₋₁

]

T

∈

C

m

we write

(2.1)

_p

(

_z

;

_x

)

≡

_z

m

−

m

−1

j=0

x

j

z

j

∈ M

m

.

Let two complex numbers,

_α

and

_β

, be given, and deﬁne

_δ

≡

_β

−

_α

. Then

p

(

_β

+

_z

;

_x

) =

_p

((

_β

−

_α

) + (

_α

+

_z

);

_x

) = (

_δ

+ (

_α

+

_z

))

m

−

m

−1 j=0

x

j

(

δ

+ (

α

+

z

))

j

=

m

j=0

m

j

δ

m−j

(

_α

+

_z

)

j

−

m

−1 j=0

x

j j

=0

j

δ

j−

(

_α

+

_z

)

= (

_α

+

_z

)

m

+

m

−1 j=0

m

j

δ

m−j

(

_α

+

_z

)

j

−

_x

_j j

=0

j

δ

j−

(

_α

+

_z

)

(2.2)

= (

_α

+

_z

)

m

−

m

−1 j=0

⎛

⎝

m

−1 =j

j

δ

−j

x

−

m

j

δ

m−j

⎞

⎠

(

_α

+

_z

)

j

≡

(

_α

+

_z

)

m

−

m

−1 j=0

y

j

(

α

+

z

)

j

≡

p

(

α

+

z

;

y

)

.

(4)

A closer examination of (2.2) shows that the two vectors

_y

and

_x

in the identity

p

(

_α

+

_z

;

_y

) =

_p

(

_β

+

_z

;

_x

) are related by

⎡ ⎢ ⎢ ⎢ ⎣ y0 y1 .. . ym−1 ⎤ ⎥ ⎥ ⎥ ⎦ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ ₀ 0 δ 0 1 0 δ 1 2 0 δ 2 _{· · ·} m−1 0 δ m−1 ₁ 1 δ 0 2 1 δ 1 _{· · ·} m−2 1 δ m−2 ... .._. _m−₁ m−1 δ 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎣ x0 x1 .. . xm−1 ⎤ ⎥ ⎥ ⎥ ⎦ − ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ _m 0 δ m _m 1 δ m−1 .. . _m m−1 δ 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦.

We can write this as

(2.3)

_y

=

_h

_δ

(

_x

)

_,

where

_h

_δ

(

_x

)

≡

_M

_δ

_x

−

_v

_δ

_.

The matrix

_M

_δ

∈

C

m×m

is an invertible upper triangular matrix; all its diagonal

elements are equal to 1. Thus, for any

_δ

∈

C

,

h

δ

:

x

→

M

δ

x

−

v

δ

is an invertible aﬃne linear transformation on

C

m

. Note that if

_δ

= 0, then

_M

_δ

=

_I

(the identity matrix) and

_v

_δ

= 0, so that

_y

=

_x

.

The above derivation can be repeated with

_αI

,

_βI

, and

_A

replacing

_{α, β}

, and

_z

,

respectively. This yields the following result.

Lemma 2.1.

_Let

_A

∈

C

n×n

_,

_x

∈

C

m

_,

_α

∈

C

_{, and}

_β

∈

C

_{be given. Then for any}

monic polynomial

_p

of the form

(2.1)

,

(2.4)

_p

(

_βI

+

_A

;

_x

) =

_p

(

_αI

+

_A

;

_h

_δ

(

_x

))

_,

where

_δ

≡

_β

−

_α

, and

_h

_δ

is defined as in

(2.3)

.

The assertion of this lemma is an ingredient in the proof of the following theorem.

Theorem 2.2.

Let

A

∈

C

n×n

,

α

∈

C

, and a positive integer

m < d

(

A

)

be given.

Denote by

_T

_mA

(

_z

) =

_p

(

_z

;

_x

_∗

)

the

_m

th Chebyshev polynomial of

_A

. Then the following

hold:

min

p∈Mm

p

(

A

+

αI

)

= min

p∈Mm

p

(

A

)

,

T

A+αI m

(

z

) =

p

(

z

;

h

−α

(

x

∗

))

,

(2.5)

where

_h

₋_α

is defined as in

(2.3)

, and

min

p∈Mm

p

(

αA

)

=

|

α

|

m

_min

p∈Mm

p

(

A

)

,

T

αA m

(

z

) =

p

(

z

;

D

α

x

∗

)

,

(2.6)

where

_D

_α

≡

diag(

_α

m

_{, α}

m−1

_{, . . . , α}

)

.

Proof

. We ﬁrst prove (2.5). Equation (2.4) with

_β

= 0 shows that

_p

(

_A

;

_x

) =

p

(

_A

+

_αI

;

_h

₋_α

(

_x

)) holds for any

_x

∈

C

m

. This yields

min

p∈Mm

p

(

A

+

αI

)

= min

x∈Cm

p

(

A

+

αI

;

x

)

= min

_x_∈_Cm

p

(

A

+

αI

;

h

−α

(

x

))

= min

x∈Cm

p

(

A

;

x

)

= min

_p_∈Mm

p

(

A

)

(here we have used that the transformation

_h

₋_α

is invertible). To see that the

poly-nomial

_p

(

_z

;

_h

₋_α

(

_x

_∗

)) is indeed the

_m

th Chebyshev polynomial of

_A

+

_αI

, we note

that

p

(

_A

+

_αI

;

_h

₋_α

(

_x

_∗

))

=

_p

(

_A

;

_x

_∗

)

= min

p∈Mm

p

(

A

)

= min

p∈Mm

p

(

A

+

αI

)

.

(5)

2209

The equations in (2.6) are trivial if

_α

= 0, so we can assume that

_α

= 0. Then

the matrix

_D

_α

is invertible, and a straightforward computation yields

min

p∈Mm

p

(

αA

)

= min

x∈Cm

p

(

αA

;

x

)

=

|

α

|

m

_min

x∈Cm

p

(

A

;

D

−1 α

x

)

=

|

α

|

m_x

min

_∈_Cm

p

(

A

;

x

)

=

|

_α

|

m

min

p∈Mm

p

(

A

)

.

Furthermore,

p

(

_αA

;

_D

_α

_x

_∗

)

=

|

_α

|

m

_p

(

_A

;

_x

_∗

)

=

|

_α

|

m

min

p∈Mm

p

(

A

)

= min

p∈Mm

p

(

αA

)

,

so that

_p

(

_z

;

_D

_α

_x

_∗

) is the

_m

th Chebyshev polynomial of the matrix

_αA

.

The fact that the “true” Arnoldi approximation problem, i.e., the right-hand side

of (1.2), is translation invariant has been mentioned previously in [11, p. 361]. Hence

the translation invariance of problem (1.1) shown in (2.5) is not surprising. The

underlying reason is that the monic polynomials are normalized “at inﬁnity.”

The result for the scaled matrices in (2.6), which also may be expected, has an

important consequence that is easily overlooked: Suppose that for some given

_A

∈

C

n×n

_{we have computed the sequence of norms of problem (1.1), i.e., the quantities}

T

1A

(

A

)

,

T

2A

(

A

)

,

T

3A

(

A

)

, . . . .

If we scale

_A

by

_α

∈

C

, then the norms of the Chebyshev approximation problem

for

the scaled matrix

_αA

are given by

|

α

|

T

1A

(

A

)

,

|

α

|

2

T

2A

(

A

)

,

|

α

|

3

T

3A

(

A

)

, . . . .

A suitable scaling can therefore turn any given sequence of norms for the problem

with

_A

into a strictly monotonically decreasing (or, if we prefer, increasing) sequence

for the problem with

_αA

. For example, the matrix

A

=

⎡

⎣

1

4

2

5

3

6

7

8

9 ⎤

⎦

yields

T

0A

(

A

)

= 1

,

T

1A

(

A

)

≈

11 .

4077

,

T

2A

(

A

)

= 9;

cf. [25, p. 280] (note that by deﬁnition

_T

₀A

(

_z

)

≡

1 for any matrix

_A

). The

correspond-ing norms for the scaled matrices

₁₂1

· _A

and 12

· _A

are then (approximately) given

by

1 _,

0 _.

95064

_,

0 _.

0625

_,

and

1 _,

136 _.

8924

_,

1296

_,

respectively. In general we expect that the behavior of an iterative method for solving

linear systems or for approximating eigenvalues is invariant under scaling of the given

matrix. In particular, by looking at the sequence of norms of problem (1.1)

alone

we cannot determine how fast a method “converges.” In practice, we always have

to measure “convergence” in some

relative

(rather than absolute) sense. Note that

the quantity min

_p_∈Mm

_p

(

_A

)

_/

_A

m

is independent of a scaling of the matrix

_A

, and

hence in our context it may give relevant information. We have not explored this

topic further.

(6)

2.2. Alternation property for block-diagonal matrices.

It is well known

that Chebyshev polynomials of compact sets Ω are characterized by an

alternation

property

. For example, if Ω = [

_{a, b}

] is a ﬁnite real interval, then

_p

(

_z

)

∈ M

_m

is the

unique Chebyshev polynomial of degree

_m

on Ω if and only if

_p

(

_z

) assumes its extreme

values

±

max

_z_∈_Ω

|

_p

(

_z

)

|

with successively alternating signs on at least

_m

+ 1 points

(the “alternation points”) in Ω; see, e.g., [4, section 7.5]. There exist generalizations

of this classical result to complex as well as to ﬁnite sets Ω; see, e.g., [6, Chapter 3]

and [4, section 7.5]. The following is a generalization to block-diagonal matrices.

Theorem 2.3.

_{Consider a block-diagonal matrix}

_A

_{= diag(}

_A

₁

_{, . . . , A}

_h

₎

_{, let}

_k

≡

max

₁_≤_j_≤_h

_d

(

_A

_j

)

, and let

be a given positive integer such that

_k

· _{< d}

(

_A

)

. Then

the matrix

_T

_kA_·

(

_A

) = diag(

_T

_kA_·

(

_A

₁

)

_{, . . . , T}

_kA_·

(

_A

_h

))

has at least

+ 1

diagonal blocks

T

_kA_·

(

_A

_j

)

with norm equal to

_T

_kA_·

(

_A

)

.

Proof

. The assumption that

_k

· _{< d}

(

_A

) implies that

_T

_kA_·

(

_z

) is uniquely deﬁned.

For simplicity of notation we denote

_B

=

_T

_kA_·

(

_A

) and

_B

_j

≡

_T

_kA_·

(

_A

_j

),

_j

= 1

_{, . . . , h}

.

Without loss of generality we can assume that

_B

=

_B

₁

≥ · · · ≥

_B

_h

.

Suppose that the assertion is false. Then there exists an integer

_i

, 1

≤

_i

≤

,

so that

_B

=

_B

₁

=

· · ·

=

_B

_i

_>

_B

_i₊₁

. Let

_δ

≡

_B

−

_B

_i₊₁

_>

0, and let

q

j

(

z

)

∈ M

k

be a polynomial with

q

j

(

A

j

) = 0, 1

≤

j

≤

h

. Deﬁne the polynomial

t

(

_z

)

≡

j=1

q

j

(

z

)

∈ M

k·

.

Let

be a positive real number with

<

δ

+ max

1≤j≤h

t

(

A

j

)

.

Then 0

_{< <}

1, and hence

r

(

z

)

≡

(1

−

)

T

kA·

(

z

) +

t

(

z

)

∈ Mk

·

.

Note that

_r

(

_A

)

= max

₁_≤_j_≤_h

_r

(

_A

_j

)

. For 1

≤

_j

≤

_i

, we have

r

(

A

j

)

= (1

−

)

B

j

= (1

−

)

B

<

B

.

For

_i

+ 1

≤

_j

≤

_h

, we have

r

(

A

j

)

=

(1

−

)

B

j

+

t

(

A

j

)

≤

(1

−

)

_B

j

+

_t

(

_A

_j

)

≤

(1

−

)

_B

_i₊₁

+

_t

(

_A

_j

)

= (1

−

) (

_B

−

_δ

) +

_t

(

_A

_j

)

= (1

−

)

_B

+

(

_δ

+

_t

(

_A

_j

)

−

_δ.

Since

(

_δ

+

_t

(

_A

_j

)

−

_{δ <}

0 by the deﬁnition of

, we see that

_r

(

_A

_j

)

_<

_B

for

i

+ 1

≤

_j

≤

_h

. But this means that

_r

(

_A

)

_<

_B

, which contradicts the minimality

of the Chebyshev polynomial of

_A

.

The numerical results shown in Table 1 illustrate this theorem. We have used

a block-diagonal matrix

_A

with four Jordan blocks of size 3

×

3 on its diagonal, so

that

_k

= 3. Theorem 2.3 then guarantees that

_T

₃A

(

_A

),

= 1

_,

2 _,

3, has at least

+ 1

diagonal blocks with the same maximal norm. This is clearly conﬁrmed for

= 1 and

(7)

2211

Table 1

Numerical illustration of Theorem2.3: HereA= diag(A1, A2, A3, A4), where eachAj=J_λj is a3×3Jordan block. The four eigenvalues are−3,−0.5,0.5,0.75.

m TA m(A1) TmA(A2) TmA(A3) TmA(A4) 1 2.6396 1.4620 2.3970 2.6396 2 4.1555 4.1555 3.6828 4.1555 3 9.0629 5.6303 7.6858 9.0629 4 14.0251 14.0251 11.8397 14.0251 5 22.3872 20.7801 17.6382 22.3872 6 22.6857 22.6857 20.3948 22.6857

= 2 (it also holds for

= 3). For these

we observe that

exactly

+ 1 diagonal

blocks achieve the maximal norm. Hence in general the lower bound of

+ 1 blocks

attaining the maximal norm in step

_m

=

_k

· cannot be improved. In addition, we see

in this experiment that the number of diagonal blocks with the same maximal norm

is not necessarily a monotonically increasing function of the degree of the Chebyshev

polynomial.

Now consider the matrix

A

= diag(

_A

₁

_{, A}

₂

) =

⎡

⎢

⎣

1

1 −

1

1 −

1 ⎤

⎥

⎦

.

Then for

_p

(

_z

) =

_z

2

−

_αz

−

_β

∈ M

₂

we get

p

(

_A

) =

⎡

⎢

⎣

1 −

(

_α

+

_β

)

2 −

_α

1 −

(

_α

+

_β

)

1 −

(

_α

+

_β

)

−

2 −

_α

1 −

(

_α

+

_β

)

⎤

⎥

⎦

.

For

_α

= 0 and

any

_β

∈

R

we will have

_p

(

_A

)

=

_p

(

_A

₁

)

=

_p

(

_A

₂

)

. Hence there are

inﬁnitely many polynomials

_p

∈ M

₂

for which the two diagonal blocks have the same

maximal norm. One of these polynomials is the Chebyshev polynomial

_T

₂A

(

_z

) =

_z

2

+1.

As shown by this example, the condition in Theorem 2.3 on a polynomial

_p

∈ M

_k_·

that at least

+ 1 diagonal blocks of

_p

(

_A

) have equal maximal norm is in general

necessary but

not

suﬃcient so that

_p

(

_z

) =

_T

_kA_·

(

_z

).

Finally, as a special case of Theorem 2.3 consider a matrix

_A

= diag(

_λ

₁

_{, . . . , λ}

_n

)

with distinct diagonal elements

_λ

₁

_{, . . . , λ}

_n

∈

C

. If

_{m < n}

, then there are at least

m

+ 1 diagonal elements

_λ

_j

with

|

_T

_mA

(

_λ

_j

)

|

=

_T

_mA

(

_A

)

= max

₁_≤_i_≤_n

|

_T

_mA

(

_λ

_i

)

|

. Note

that

_T

_mA

(

_z

) in this case is equal to the

_m

th Chebyshev polynomial of the ﬁnite set

{

λ

1

, . . . , λ

n

} ⊂

C

. This shows that the Chebyshev polynomial of degree

m

of a set in

the complex plane consisting of

_n

≥

_m

+ 1 points attains its maximum value at least

at

_m

+ 1 points.

2.3. Chebyshev polynomials with known zero coeﬃcients.

In this section

we study properties of a matrix

_A

so that its Chebyshev polynomials have known

zero coeﬃcients. An extreme case in this respect is when

_T

_mA

(

_z

) =

_z

m

, i.e., when all

coeﬃcients of

_T

_mA

(

_z

), except the leading one, are zero. This happens if and only if

A

m

=

min

p∈Mm

p

(

A

)

.

(8)

Equivalently, this says that the zero matrix is the best approximation of

_A

m

from the

linear space span

{

_{I, A, . . . , A}

m−1

}

(with respect to the matrix 2-norm). To

charac-terize this property, we recall that the dual norm to the matrix 2-norm

· is the

trace norm (also called energy norm or

_c

₁

-norm),

(2.7)

|||

_M

||| ≡

r

j=1

σ

j

(

_M

)

_,

where

_σ

₁

(

_M

)

_{, . . . , σ}

_r

(

_M

) denote the singular values of the matrix

_M

∈

C

n×n

with

rank(

_M

) =

_r

. For

_X

∈

C

n×n

and

_Y

∈

C

n×n

we deﬁne the inner product

_{X, Y}

≡

trace(

_Y

∗

_X

). We can now adapt a result of Zi¸etak [27, p. 173] to our context and

notation.

Theorem 2.4.

Let

A

∈

C

n×n

and a positive integer

m < d

(

A

)

be given. Then

T

mA

(

z

) =

z

m

if and only if there exists a matrix

_Z

∈

C

n×n

with

|||

_Z

|||

= 1

such that

(2.8)

_{Z, A}

k

= 0

_{, k}

= 0

_{, . . . , m}

−

1 _,

and

Re

_{Z, A}

m

=

_A

m

_.

As shown in [13, Theorem 3.4], the

_m

th Chebyshev polynomial of a Jordan block

J

λ

with eigenvalue

_λ

∈

C

is given by (

_z

−

_λ

)

m

. In particular, the

_m

th Chebyshev

polynomial of

_J

₀

is of the form

_z

m

. A more general class of matrices with

_T

_mA

(

_z

) =

_z

m

is studied in section 3.1 below.

It is well known that the Chebyshev polynomials of real intervals that are

sym-metric with respect to the origin are alternating between even and odd, i.e., every

second coeﬃcient (starting from the highest one) of

_T

_m[−a,a]

(

_z

) is zero, which means

that

_T

_m[−a,a]

(

_z

) = (

−

1)

m

_T

_m[−a,a]

(

−

_z

). We next give an analogue of this result for

Chebyshev polynomials of matrices.

Theorem 2.5.

_Let

_A

∈

C

n×n

_{and a positive integer}

_{m < d}

₍

_A

₎

_{be given. If there}

exists a unitary matrix

_P

such that either

_P

∗

_AP

=

−

_A

, or

_P

∗

_AP

=

−

_A

T

, then

(2.9)

_T

_mA

(

_z

) = (

−

1)

m

_T

_mA

(

−

_z

)

_.

Proof

. We prove the assertion only in case

_P

∗

_AP

=

−

_A

; the other case is similar.

If this relation holds for a unitary matrix

_P

, then

(

−

1)

m

_T

_mA

(

−

_A

)

=

_T

_mA

(

_P

∗

_AP

)

=

_P

∗

_T

_mA

(

_A

)

_P

=

_T

_mA

(

_A

)

= min

p∈Mm

p

(

A

)

,

and the result follows from the uniqueness of the

_m

th Chebyshev polynomial of

A

.

As a special case consider a normal matrix

_A

and its unitary diagonalization

U

∗

AU

=

_D

. Then

_T

_mA

(

_z

) =

_T

_mD

(

_z

), so we may consider only the Chebyshev

polyno-mial of the diagonal matrix

_D

. Since

_D

=

_D

T

, the conditions in Theorem 2.5 are

satisﬁed if and only if there exists a unitary matrix

_P

such that

_P

∗

_DP

=

−

_D

. But

this means that the set of the diagonal elements of

_D

(i.e., the eigenvalues of

_A

) must

be symmetric with respect to the origin (i.e., if

_λ

_j

is an eigenvalue,

−

_λ

_j

is an

eigen-value as well). Therefore, whenever a discrete set Ω

⊂

C

is symmetric with respect

to the origin, the Chebyshev polynomial

_T

_mΩ

(

_z

) is even (odd) if

_m

is even (odd).

(9)

2213

As an example of a nonnormal matrix, consider

A

=

⎡

⎢

⎣

1 −

1

1 _/

1 −

1 ⎤

⎥

⎦

,

>

0 _,

which has been used by Toh [22] in his analysis of the convergence of the GMRES

method. He has shown that

_P

T

_AP

=

−

_A

, where

P

=

⎡

⎢

⎣

1 −

1

1 −

1 ⎤

⎥

⎦

is an orthogonal matrix.

For another example consider

(2.10)

_C

=

J

λ

J

−λ

,

J

λ

, J

−λ

∈

C

n×n

,

λ

∈

C

.

It is easily seen that

(2.11)

_J

₋_λ

=

−

_I

±

_J

_λ

_I

±

_,

where

_I

±

= diag(1

_,

−

1 _,

1 _{, . . . ,}

(

−

1)

n−1

)

∈

R

n×n

_.

Using the symmetric and orthogonal matrices

P

=

I

,

Q

=

I

±

I

±

,

we receive

_{QP CP Q}

=

−

_C

.

The identity (2.11) implies that

T

mC

(

J

−λ

)

=

_T

_mC

(

−

_I

±

_J

_λ

_I

±

)

=

_T

_mC

(

_J

_λ

)

_,

i.e., the Chebyshev polynomials of

_C

attain the same norm on each of the two diagonal

blocks. In general, we can shift and rotate any matrix consisting of two Jordan blocks

of the same size and with respective eigenvalues

_{λ, μ}

∈

C

into the form (2.10). It

then can be shown that the Chebyshev polynomials

_T

_mA

(

_z

) of

_A

= diag(

_J

_λ

_{, J}

_μ

) satisfy

the “norm balancing property”

_T

_mA

(

_J

_λ

)

=

_T

_mA

(

_J

_μ

)

. The proof of this property is

rather technical and we skip it for brevity.

2.4. Linear Chebyshev polynomials.

In this section we consider the linear

Chebyshev problem

min

α∈C

A

−

αI

.

Work related to this problem has been done by Friedland [8], who characterized

solu-tions of the problem min

_α_∈_R

_A

−

_αB

, where

_A

and

_B

are two complex, and possibly

rectangular matrices. This problem in general does not have a unique solution. More

recently, Afanasjew et al. [1] have studied the restarted Arnoldi method with restart

length equal to 1. The analysis of this method involves approximation problems of

the form min

_α_∈_C

(

_A

−

_αI

)

_v

₁

(cf. (1.2)), whose unique solution is

_α

=

_v

₁∗

_Av

₁

.

(10)

Theorem 2.6.

Let

A

∈

C

n×n

be any (nonzero) matrix, and denote by

Σ(

A

)

the

span of the right singular vectors of

_A

corresponding to the maximal singular value

of

_A

. Then

_T

₁A

(

_z

) =

_z

if and only if there exists a vector

_w

∈

Σ(

_A

)

with

_w

∗

_Aw

= 0

.

Proof

. If

_T

₁A

(

_z

) =

_z

, then

_A

= min

_α_∈_C

_A

−

_αI

. According to a result of

Greenbaum and Gurvits [10, Theorem 2.5], there exists a unit norm vector

_w

∈

C

n

,

so that

2

min

α∈C

A

−

αI

= min

α∈C

(

A

−

αI

)

w

.

The unique solution of the problem on the right-hand side is

_α

_∗

=

_w

∗

_Aw

.

Our

assumption now implies that

_w

∗

_Aw

= 0, and the equations above yield

_A

=

_Aw

,

which shows that

_w

∈

Σ(

_A

).

On the other hand, suppose that there exists a vector

_w

∈

Σ(

_A

) such that

_w

∗

_Aw

=

0. Without loss of generality we can assume that

_w

= 1. Then

A

≥

min

α∈C

A

−

αI

≥

min

α∈C

Aw

−

αw

= min

α∈C

(

Aw

+

αw

) =

Aw

.

In the ﬁrst equality we have used that

_w

∗

_Aw

= 0, i.e., that the vectors

_w

and

_Aw

are

orthogonal. The assumption

_w

∈

Σ(

_A

) implies that

_Aw

=

_A

, and thus equality

must hold throughout the above relations. In particular,

_A

= min

_α_∈_C

_A

−

_αI

,

and hence

_T

₁A

(

_z

) =

_z

follows from the uniqueness of the solution.

An immediate consequence of this result is that if zero is outside the ﬁeld of values

of

_A

, then

_T

₁A

(

_A

)

_<

_A

. Note that this also follows from [21, Theorem 5.10], which

states that the zeros of

_T

_mA

(

_z

) are contained in the ﬁeld of values of

_A

.

We will now study the relation between Theorem 2.4 for

_m

= 1 and Theorem 2.6.

Let

_w

∈

Σ(

_A

) and let

_u

∈

C

n

be a corresponding left singular vector, so that

_Aw

=

A

u

and

_A

∗

_u

=

_A

_w

. Then the condition

_w

∗

_Aw

= 0 in Theorem 2.6 implies that

w

∗

u

= 0. We may assume that

_w

=

_u

= 1. Then the rank-one matrix

_Z

≡

_uw

∗

satisﬁes

|||

_Z

|||

= 1,

0 =

_w

∗

_u

=

n

i=1

w

i

u

i

= trace(

Z

) =

Z, I

=

Z, A

0

,

and

Z, A

= trace(

_A

∗

_uw

∗

) =

_A

trace(

_ww

∗

) =

_A

n

i=1

w

i

w

i

=

_A

_.

Hence Theorem 2.6 shows that

_T

₁A

(

_z

) =

_z

if and only if there exists a rank-one matrix

Z

satisfying the conditions (2.8).

Above we have already mentioned that

_T

₁A

(

_z

) =

_z

holds when

_A

is a Jordan

block with eigenvalue zero. It is easily seen that, in the notation of Theorem 2.6,

the vector

_w

in this case is given by the last canonical basis vector. Furthermore,

T

₁A

(

_z

) =

_z

holds for any matrix

_A

that satisﬁes the conditions of Theorem 2.5, i.e.,

P

∗

AP

=

−

_A

or

_P

∗

_AP

=

−

_A

T

for some unitary matrix

_P

.

2_{Greenbaum and Gurvits have stated this result for real matrices only, but since its proof mainly}

involves singular value decompositions of matrices, a generalization to the complex case is straight-forward.