SIAM J. MATRIXANAL.APPL. c 2010 Society for Industrial and Applied Mathematics Vol. 31, No. 4, pp. 2205–2221

**ON CHEBYSHEV POLYNOMIALS OF MATRICES**

*∗*

VANCE FABER*†*, J ¨ORG LIESEN*‡*, AND PETR TICH ´Y*§*

**Abstract.** The*m*th Chebyshev polynomial of a square matrix*A*is the monic polynomial that
minimizes the matrix 2-norm of*p*(*A*) over all monic polynomials*p*(*z*) of degree*m*. This polynomial
is uniquely deﬁned if*m*is less than the degree of the minimal polynomial of*A*. We study general
properties of Chebyshev polynomials of matrices, which in some cases turn out to be generalizations
of well-known properties of Chebyshev polynomials of compact sets in the complex plane. We also
derive explicit formulas of the Chebyshev polynomials of certain classes of matrices, and explore the
relation between Chebyshev polynomials of one of these matrix classes and Chebyshev polynomials
of lemniscatic regions in the complex plane.

**Key words.** matrix approximation problems, Chebyshev polynomials, complex approximation
theory, Krylov subspace methods, Arnoldi’s method

**AMS subject classifications.**41A10, 15A60, 65F10

**DOI.**10.1137/090779486

**1. Introduction.**

## Let

_{A}

_{A}

*∈*

## C

*n×n*

## be a given matrix, let

_{m}

_{m}

*≥*

## 1 be a given integer,

## and let

*M*

_{m}## denote the set of complex

*monic*

## polynomials of degree

_{m}

_{m}

## . We consider

## the approximation problem

## (1.1)

## min

*p∈Mm*

*p*

## (

*A*

## )

*,*

## where

*·*

## denotes the matrix 2-norm (or spectral norm). As shown by Greenbaum and

## Trefethen [11, Theorem 2] (also cf. [13, Theorem 2.2]), problem (1.1) has a uniquely

## deﬁned solution when

_{m}

_{m}

## is smaller than

_{d}

_{d}

## (

_{A}

_{A}

## ), the degree of the minimal polynomial

## of

_{A}

_{A}

## . This is a nontrivial result since the matrix 2-norm is not strictly convex, and

## approximation problems in such norms are in general not guaranteed to have a unique

## solution; see [13, pp. 853–854] for more details and an example. In this paper we

## assume that

_{m < d}

_{m < d}

## (

_{A}

_{A}

## ), which is necessary and suﬃcient so that the value of (1.1) is

## positive, and we denote the unique solution of (1.1) by

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ). Note that if

_{A}

_{A}

*∈*

## R

*n×n*

## ,

## then the Chebyshev polynomials of

_{A}

_{A}

## have real coeﬃcients, and hence in this case it

## suﬃces to consider only real monic polynomials in (1.1).

## It is clear that (1.1) is unitarily invariant, i.e., that

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) =

_{T}

_{T}

_{m}U∗AU## (

_{z}

_{z}

## ) for any

## unitary matrix

_{U}

_{U}

*∈*

## C

*n×n*

## . In particular, if the matrix

_{A}

_{A}

## is normal, i.e., unitarily

## diagonalizable, then

## min

*p∈Mm*

*p*

## (

*A*

## )

## =

*p∈Mm*

## min

*λ*

## max

*∈*Λ(

*A*)

*|*

*p*

## (

*λ*

## )

*|*

*,*

*∗*_{Received by the editors December 8, 2009; accepted for publication (in revised form) by}
M. H. Gutknecht April 5, 2010; published electronically June 24, 2010.

http://www.siam.org/journals/simax/31-4/77948.html
*†*_{Cloudpak, Seattle, WA 98109 (vance.faber@gmail.com).}

*‡*_{Institute of Mathematics, Technical University of Berlin, Straße des 17. Juni 136, 10623 Berlin,}
Germany (liesen@math.tu-berlin.de). The work of this author was supported by the Heisenberg
Program of the Deutsche Forschungsgemeinschaft.

*§*_{Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vod´}_{arenskou}
vˇeˇz´ı 2, 18207 Prague, Czech Republic (tichy@cs.cas.cz). The work of this author is a part of the
Institutional Research Plan AV0Z10300504 and was supported by GAAS grant IAA100300802 and by
project M100300901 of the institutional support of the Academy of Sciences of the Czech Republic.

2205

## where Λ(

_{A}

_{A}

## ) denotes the set of the eigenvalues of

_{A}

_{A}

## . The (uniquely deﬁned)

_{m}

_{m}

## th

## degree monic polynomial that deviates least from zero on a compact set Ω in the

## complex plane is called the

_{m}

_{m}

*th Chebyshev polynomial*

1 *of the set*

## Ω. We denote this

## polynomial by

_{T}

_{T}

*Ω*

_{m}## (

_{z}

_{z}

## ).

## The last equation shows that for a normal matrix

_{A}

_{A}

## the matrix approximation

## problem (1.1) is equal to the scalar approximation problem of ﬁnding

_{T}

_{T}

*Λ(*

_{m}*A*)

## (

_{z}

_{z}

## ), and

## in fact

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) =

_{T}

_{T}

*Λ(*

_{m}*A*)

## (

_{z}

_{z}

## ). Because of these relations, problem (1.1) can be considered

## a generalization of a classical problem of mathematics from scalars to matrices. As a

## consequence, Greenbaum and Trefethen [11] as well as Toh and Trefethen [24] have

## called the solution

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) of (1.1) the

_{m}

_{m}

*th Chebyshev polynomial of the matrix*

_{A}

_{A}

## (regardless of

_{A}

_{A}

## being normal or not).

## A motivation for studying problem (1.1) and the Chebyshev polynomials of

## ma-trices comes from their connection to Krylov subspace methods, and in particular the

## Arnoldi method for approximating eigenvalues of matrices [2]. In a nutshell, after

_{m}

_{m}

## steps of this method a relation of the form

_{AV}

_{AV}

_{m}## =

_{V}

_{V}

_{m}_{H}

_{H}

_{m}## +

_{r}

_{r}

_{m}_{e}

_{e}

*T*

_{m}## is computed, where

*H*

*m*

*∈*

## C

*m×m*

## is an upper Hessenberg matrix,

*r*

*m*

*∈*

## C

*n*

## is the

*m*

## th “residual” vector,

*e*

*m*

## is the

_{m}

_{m}

## th canonical basis vector of

## C

*m*

## , and the columns of

_{V}

_{V}

_{m}*∈*

## C

*n×m*

## form an

## orthonormal basis of the Krylov subspace

*Km*

## (

_{A, v}

_{A, v}

_{1}

## ) = span

*{*

_{v}

_{v}

_{1}

_{, Av}

_{, Av}

_{1}

_{, . . . , A}

_{, . . . , A}

*m−*1

_{v}

_{v}

_{1}

*}*

## .

## The vector

_{v}

_{v}

_{1}

*∈*

## C

*n*

## is an arbitrary (nonzero) initial vector. The eigenvalues of

_{H}

_{H}

_{m}## are used as approximations for the eigenvalues of

_{A}

_{A}

## . Note that

_{r}

_{r}

_{m}## = 0 if and only

## if the columns of

_{V}

_{V}

_{m}## span an invariant subspace of

_{A}

_{A}

## , and if this holds, then each

## eigenvalue of

_{H}

_{H}

_{m}## is an eigenvalue of

_{A}

_{A}

## .

## As shown by Saad [15, Theorem 5.1], the characteristic polynomial

_{ϕ}

_{ϕ}

_{m}## of

_{H}

_{H}

_{m}## satisﬁes

## (1.2)

_{ϕ}

_{ϕ}

_{m}## (

_{A}

_{A}

## )

_{v}

_{v}

_{1}

## =

## min

*p∈Mm*

*p*

## (

*A*

## )

*v*

1*.*

## An interpretation of this result is that the characteristic polynomial of

_{H}

_{H}

_{m}## solves the

## Chebyshev approximation problem for

_{A}

_{A}

*with respect to the given starting vector*

_{v}

_{v}

_{1}

## .

## Saad pointed out that (1.2) “seems to be the only known optimality property that

## is satisﬁed by the [Arnoldi] approximation process in the nonsymmetric case” [16,

## p. 171]. To learn more about this property, Greenbaum and Trefethen [11, p. 362]

## suggested “to disentangle [the] matrix essence of the process from the distracting

## eﬀects of the initial vector,” and hence study the “idealized” problem (1.1) instead of

## (1.2). They referred to the solution of (1.1) as the

_{m}

_{m}

*th ideal Arnoldi polynomial of*

_{A}

_{A}

## (in addition to the name

_{m}

_{m}

## th Chebyshev polynomial of

_{A}

_{A}

## ).

## Greenbaum and Trefethen [11] seem to be the ﬁrst who studied existence and

## uniqueness of Chebyshev polynomials of matrices. Toh and Trefethen [24] derived an

## algorithm for computing these polynomials based on semideﬁnite programming; see

## also Toh’s Ph.D. thesis [21, Chapter 2]. This algorithm is now part of the SDPT3

## Toolbox [23]. The paper [24] as well as [21] and [25, Chapter 29] give numerous

## computed examples for the norms, roots, and coeﬃcients of Chebyshev polynomials

## of matrices. It is shown numerically that the lemniscates of these polynomials tend

## to approximate pseudospectra of

_{A}

_{A}

## . In addition, Toh has shown that the zeros of

*T*

_{m}A## (

_{z}

_{z}

## ) are contained in the ﬁeld of values of

_{A}

_{A}

## [21, Theorem 5.10]. This result is

1_{Pafnuti Lvovich Chebyshev (1821–1894) determined the polynomials}* _{T}*Ω

*m*(*z*) of Ω = [*−a, a*] (a real
interval symmetric to zero) in his 1859 paper [5], which laid the foundations of modern approximation
theory. We recommend Steﬀens’ book [18] to readers who are interested in the historical development
of the subject.

ON CHEBYSHEV POLYNOMIALS OF MATRICES

## 2207

## part of his interesting analysis of Chebyshev polynomials of linear operators in inﬁnite

## dimensional Hilbert spaces [21, Chapter 5]. The ﬁrst explicit solutions for the problem

## (1.1) for a nonnormal matrix

_{A}

_{A}

## we are aware of have been given in [13, Theorem 3.4].

## It is shown there that

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) = (

_{z}

_{z}

*−*

_{λ}

_{λ}

## )

*m*

## if

_{A}

_{A}

## =

_{J}

_{J}

_{λ}## , a Jordan block with eigenvalue

*λ*

*∈*

## C

## . Note that in this case the Chebyshev polynomials of

_{A}

_{A}

## are independent of the

## size of

_{A}

_{A}

## .

## The above remarks show that problem (1.1) is a mathematically interesting

## gen-eralization of the classical Chebyshev problem, which has an important application

## in the area of iterative methods. Yet, our survey of the literature indicates that there

## has been little theoretical work on Chebyshev polynomials of matrices (in particular

## when compared with the substantial work on Chebyshev polynomials for compact

## sets). The main motivation for writing this paper was to extend the existing

## the-ory of Chebyshev polynomials of matrices. Therefore we considered a number of

## known properties of Chebyshev polynomials of compact sets, and tried to ﬁnd matrix

## analogues. Among these are the behavior of

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) under shifts and scaling of

_{A}

_{A}

## ,

## a matrix analogue of the “alternation property,” as well as conditions on

_{A}

_{A}

## so that

*T*

*mA*

## (

*z*

## ) is even or odd (section 2). We also give further explicit examples of Chebyshev

## polynomials of some classes of matrices (section 3). For a class of block Toeplitz

## ma-trices, we explore the relation between their Chebyshev polynomials and Chebyshev

## polynomials of lemniscatic regions in the complex plane (section 4).

## All computations in this paper have been performed using MATLAB [20]. For

## computing Chebyshev polynomials of matrices we have used the DSDP software

## pack-age for semideﬁnite programming [3] and its MATLAB interface.

**2. General results.**

## In this section we state and prove results on the Chebyshev

## polynomials of a general matrix

_{A}

_{A}

## . In later sections we will apply these results to

## some speciﬁc examples.

**2.1. Chebyshev polynomials of shifted and scaled matrices.**

## In the

## fol-lowing we will write a complex (monic) polynomial of degree

_{m}

_{m}

## as a function of the

## variable

_{z}

_{z}

## and its coeﬃcients. More precisely, for

_{x}

_{x}

## = [

_{x}

_{x}

_{0}

_{, . . . , x}

_{, . . . , x}

_{m}_{−}_{1}

## ]

*T*

*∈*

## C

*m*

## we write

## (2.1)

_{p}

_{p}

## (

_{z}

_{z}

## ;

_{x}

_{x}

## )

*≡*

_{z}

_{z}

*m*

*−*

*m*

_{}

*−*1

*j*=0

*x*

*j*

*z*

*j*

*∈ M*

*m*

*.*

## Let two complex numbers,

_{α}

_{α}

## and

_{β}

_{β}

## , be given, and deﬁne

_{δ}

_{δ}

*≡*

_{β}

_{β}

*−*

_{α}

_{α}

## . Then

*p*

## (

_{β}

_{β}

## +

_{z}

_{z}

## ;

_{x}

_{x}

## ) =

_{p}

_{p}

## ((

_{β}

_{β}

*−*

_{α}

_{α}

## ) + (

_{α}

_{α}

## +

_{z}

_{z}

## );

_{x}

_{x}

## ) = (

_{δ}

_{δ}

## + (

_{α}

_{α}

## +

_{z}

_{z}

## ))

*m*

*−*

*m*

_{}

*−*1

*j*=0

*x*

*j*

## (

*δ*

## + (

*α*

## +

*z*

## ))

*j*

## =

*m*

*j*=0

*m*

*j*

*δ*

*m−j*

## (

_{α}

_{α}

## +

_{z}

_{z}

## )

*j*

*−*

*m*

_{}

*−*1

*j*=0

*x*

*j*

*j*=0

*j*

*δ*

*j−*

## (

_{α}

_{α}

## +

_{z}

_{z}

## )

## = (

_{α}

_{α}

## +

_{z}

_{z}

## )

*m*

## +

*m*

_{}

*−*1

*j*=0

*m*

*j*

*δ*

*m−j*

## (

_{α}

_{α}

## +

_{z}

_{z}

## )

*j*

*−*

_{x}

_{x}

_{j}*j*=0

*j*

*δ*

*j−*

## (

_{α}

_{α}

## +

_{z}

_{z}

## )

## (2.2)

## = (

_{α}

_{α}

## +

_{z}

_{z}

## )

*m*

*−*

*m*

_{}

*−*1

*j*=0

## ⎛

## ⎝

*m*

*−*1 =

*j*

*j*

*δ*

*−j*

*x*

*−*

*m*

*j*

*δ*

*m−j*

## ⎞

## ⎠

## (

_{α}

_{α}

## +

_{z}

_{z}

## )

*j*

*≡*

## (

_{α}

_{α}

## +

_{z}

_{z}

## )

*m*

*−*

*m*

_{}

*−*1

*j*=0

*y*

*j*

## (

*α*

## +

*z*

## )

*j*

*≡*

*p*

## (

*α*

## +

*z*

## ;

*y*

## )

*.*

## A closer examination of (2.2) shows that the two vectors

_{y}

_{y}

## and

_{x}

_{x}

## in the identity

*p*

## (

_{α}

_{α}

## +

_{z}

_{z}

## ;

_{y}

_{y}

## ) =

_{p}

_{p}

## (

_{β}

_{β}

## +

_{z}

_{z}

## ;

_{x}

_{x}

## ) are related by

⎡ ⎢ ⎢ ⎢ ⎣*y*0

*y*1 .. .

*ym−*1 ⎤ ⎥ ⎥ ⎥ ⎦ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣

_{0}0

*δ*0 1 0

*δ*1 2 0

*δ*2

_{· · ·}*m−*1 0

*δ*

*m−*1

_{1}1

*δ*0 2 1

*δ*1

_{· · ·}*m−*2 1

*δ*

*m−*2 ... ..

_{.}

_{m−}_{1}

*m−*1

*δ*0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎣

*x*0

*x*1 .. .

*xm−*1 ⎤ ⎥ ⎥ ⎥ ⎦

*−*⎡ ⎢ ⎢ ⎢ ⎢ ⎣

*0*

_{m}*δ*

*m*

*1*

_{m}*δ*

*m−*1 .. .

_{m}*m−*1

*δ*1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦

*.*

## We can write this as

## (2.3)

_{y}

_{y}

## =

_{h}

_{h}

_{δ}## (

_{x}

_{x}

## )

_{,}

_{,}

## where

_{h}

_{h}

_{δ}## (

_{x}

_{x}

## )

*≡*

_{M}

_{M}

_{δ}_{x}

_{x}

*−*

_{v}

_{v}

_{δ}_{.}

_{.}

## The matrix

_{M}

_{M}

_{δ}*∈*

## C

*m×m*

## is an invertible upper triangular matrix; all its diagonal

## elements are equal to 1. Thus, for any

_{δ}

_{δ}

*∈*

## C

## ,

*h*

*δ*

## :

*x*

*→*

*M*

*δ*

*x*

*−*

*v*

*δ*

## is an invertible aﬃne linear transformation on

## C

*m*

## . Note that if

_{δ}

_{δ}

## = 0, then

_{M}

_{M}

_{δ}## =

_{I}

_{I}

## (the identity matrix) and

_{v}

_{v}

_{δ}## = 0, so that

_{y}

_{y}

## =

_{x}

_{x}

## .

## The above derivation can be repeated with

_{αI}

_{αI}

## ,

_{βI}

_{βI}

## , and

_{A}

_{A}

## replacing

_{α, β}

_{α, β}

## , and

_{z}

_{z}

## ,

## respectively. This yields the following result.

## Lemma 2.1.

_{Let}

_{Let}

_{A}

_{A}

*∈*

## C

*n×n*

_{,}

_{,}

_{x}

_{x}

*∈*

## C

*m*

_{,}

_{,}

_{α}

_{α}

*∈*

## C

_{, and}

_{, and}

_{β}

_{β}

*∈*

## C

_{be given. Then for any}

_{be given. Then for any}

*monic polynomial*

_{p}

_{p}

*of the form*

## (2.1)

*,*

## (2.4)

_{p}

_{p}

## (

_{βI}

_{βI}

## +

_{A}

_{A}

## ;

_{x}

_{x}

## ) =

_{p}

_{p}

## (

_{αI}

_{αI}

## +

_{A}

_{A}

## ;

_{h}

_{h}

_{δ}## (

_{x}

_{x}

## ))

_{,}

_{,}

*where*

_{δ}

_{δ}

*≡*

_{β}

_{β}

*−*

_{α}

_{α}

*, and*

_{h}

_{h}

_{δ}*is defined as in*

## (2.3)

*.*

## The assertion of this lemma is an ingredient in the proof of the following theorem.

## Theorem 2.2.

*Let*

*A*

*∈*

## C

*n×n*

*,*

*α*

*∈*

## C

*, and a positive integer*

*m < d*

## (

*A*

## )

*be given.*

*Denote by*

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) =

_{p}

_{p}

## (

_{z}

_{z}

## ;

_{x}

_{x}

_{∗}## )

*the*

_{m}

_{m}

*th Chebyshev polynomial of*

_{A}

_{A}

*. Then the following*

*hold:*

## min

*p∈Mm*

*p*

## (

*A*

## +

*αI*

## )

## = min

*p∈Mm*

*p*

## (

*A*

## )

*,*

*T*

*A*+

*αI*

*m*

## (

*z*

## ) =

*p*

## (

*z*

## ;

*h*

*−α*

## (

*x*

*∗*

## ))

*,*

## (2.5)

*where*

_{h}

_{h}

_{−}_{α}*is defined as in*

## (2.3)

*, and*

## min

*p∈Mm*

*p*

## (

*αA*

## )

## =

*|*

*α*

*|*

*m*

_{min}

*p∈Mm*

*p*

## (

*A*

## )

*,*

*T*

*αA*

*m*

## (

*z*

## ) =

*p*

## (

*z*

## ;

*D*

*α*

*x*

*∗*

## )

*,*

## (2.6)

*where*

_{D}

_{D}

_{α}*≡*

## diag(

_{α}

_{α}

*m*

_{, α}

_{, α}

*m−*1

_{, . . . , α}

_{, . . . , α}

## )

*.*

*Proof*

## . We ﬁrst prove (2.5). Equation (2.4) with

_{β}

_{β}

## = 0 shows that

_{p}

_{p}

## (

_{A}

_{A}

## ;

_{x}

_{x}

## ) =

*p*

## (

_{A}

_{A}

## +

_{αI}

_{αI}

## ;

_{h}

_{h}

_{−}_{α}## (

_{x}

_{x}

## )) holds for any

_{x}

_{x}

*∈*

## C

*m*

## . This yields

## min

*p∈Mm*

*p*

## (

*A*

## +

*αI*

## )

## = min

*x∈*C

*m*

*p*

## (

*A*

## +

*αI*

## ;

*x*

## )

## = min

_{x}_{∈}_{C}

*m*

*p*

## (

*A*

## +

*αI*

## ;

*h*

*−α*

## (

*x*

## ))

## = min

*x∈*C*m*

*p*

## (

*A*

## ;

*x*

## )

## = min

_{p}_{∈Mm}*p*

## (

*A*

## )

## (here we have used that the transformation

_{h}

_{h}

_{−}_{α}## is invertible). To see that the

## poly-nomial

_{p}

_{p}

## (

_{z}

_{z}

## ;

_{h}

_{h}

_{−}_{α}## (

_{x}

_{x}

_{∗}## )) is indeed the

_{m}

_{m}

## th Chebyshev polynomial of

_{A}

_{A}

## +

_{αI}

_{αI}

## , we note

## that

*p*

## (

_{A}

_{A}

## +

_{αI}

_{αI}

## ;

_{h}

_{h}

_{−}_{α}## (

_{x}

_{x}

_{∗}## ))

## =

_{p}

_{p}

## (

_{A}

_{A}

## ;

_{x}

_{x}

_{∗}## )

## = min

*p∈Mm*

*p*

## (

*A*

## )

## = min

*p∈Mm*

*p*

## (

*A*

## +

*αI*

## )

*.*

ON CHEBYSHEV POLYNOMIALS OF MATRICES

## 2209

## The equations in (2.6) are trivial if

_{α}

_{α}

## = 0, so we can assume that

_{α}

_{α}

## = 0. Then

## the matrix

_{D}

_{D}

_{α}## is invertible, and a straightforward computation yields

## min

*p∈Mm*

*p*

## (

*αA*

## )

## = min

*x∈*C

*m*

*p*

## (

*αA*

## ;

*x*

## )

## =

*|*

*α*

*|*

*m*

_{min}

*x∈*C

*m*

*p*

## (

*A*

## ;

*D*

*−*1

*α*

*x*

## )

## =

*|*

*α*

*|*

*m*

_{x}## min

_{∈}_{C}

*m*

*p*

## (

*A*

## ;

*x*

## )

## =

*|*

_{α}

_{α}

*|*

*m*

## min

*p∈Mm*

*p*

## (

*A*

## )

*.*

## Furthermore,

*p*

## (

_{αA}

_{αA}

## ;

_{D}

_{D}

_{α}_{x}

_{x}

_{∗}## )

## =

*|*

_{α}

_{α}

*|*

*m*

_{p}

_{p}

## (

_{A}

_{A}

## ;

_{x}

_{x}

_{∗}## )

## =

*|*

_{α}

_{α}

*|*

*m*

## min

*p∈Mm*

*p*

## (

*A*

## )

## = min

*p∈Mm*

*p*

## (

*αA*

## )

*,*

## so that

_{p}

_{p}

## (

_{z}

_{z}

## ;

_{D}

_{D}

_{α}_{x}

_{x}

_{∗}## ) is the

_{m}

_{m}

## th Chebyshev polynomial of the matrix

_{αA}

_{αA}

## .

## The fact that the “true” Arnoldi approximation problem, i.e., the right-hand side

## of (1.2), is translation invariant has been mentioned previously in [11, p. 361]. Hence

## the translation invariance of problem (1.1) shown in (2.5) is not surprising. The

## underlying reason is that the monic polynomials are normalized “at inﬁnity.”

## The result for the scaled matrices in (2.6), which also may be expected, has an

## important consequence that is easily overlooked: Suppose that for some given

_{A}

_{A}

*∈*

## C

*n×n*

_{we have computed the sequence of norms of problem (1.1), i.e., the quantities}

*T*

1*A*

## (

*A*

## )

*,*

*T*

2*A*

## (

*A*

## )

*,*

*T*

3*A*

## (

*A*

## )

*, . . . .*

## If we scale

_{A}

_{A}

## by

_{α}

_{α}

*∈*

## C

## , then the norms of the Chebyshev approximation problem

*for*

*the scaled matrix*

_{αA}

_{αA}

## are given by

*|*

*α*

*| *

*T*

1*A*

## (

*A*

## )

*,*

*|*

*α*

*|*

2*T*

2*A*

## (

*A*

## )

*,*

*|*

*α*

*|*

3*T*

3*A*

## (

*A*

## )

*, . . . .*

## A suitable scaling can therefore turn any given sequence of norms for the problem

## with

_{A}

_{A}

## into a strictly monotonically decreasing (or, if we prefer, increasing) sequence

## for the problem with

_{αA}

_{αA}

## . For example, the matrix

*A*

## =

## ⎡

## ⎣

## 1

## 4

## 2

## 5

## 3

## 6

## 7

## 8

## 9

## ⎤

## ⎦

## yields

*T*

0*A*

## (

*A*

## )

## = 1

*,*

*T*

1*A*

## (

*A*

## )

* ≈*

## 11

*.*

## 4077

*,*

*T*

2*A*

## (

*A*

## )

## = 9;

## cf. [25, p. 280] (note that by deﬁnition

_{T}

_{T}

_{0}

*A*

## (

_{z}

_{z}

## )

*≡*

## 1 for any matrix

_{A}

_{A}

## ). The

## correspond-ing norms for the scaled matrices

_{12}1

*·*

_{A}

_{A}

## and 12

*·*

_{A}

_{A}

## are then (approximately) given

## by

## 1

_{,}

_{,}

## 0

_{.}

_{.}

## 95064

_{,}

_{,}

## 0

_{.}

_{.}

## 0625

_{,}

_{,}

## and

## 1

_{,}

_{,}

## 136

_{.}

_{.}

## 8924

_{,}

_{,}

## 1296

_{,}

_{,}

## respectively. In general we expect that the behavior of an iterative method for solving

## linear systems or for approximating eigenvalues is invariant under scaling of the given

## matrix. In particular, by looking at the sequence of norms of problem (1.1)

*alone*

## we cannot determine how fast a method “converges.” In practice, we always have

## to measure “convergence” in some

*relative*

## (rather than absolute) sense. Note that

## the quantity min

_{p}_{∈Mm}_{p}

_{p}

## (

_{A}

_{A}

## )

_{/}

_{/}

_{A}

_{A}

*m*

## is independent of a scaling of the matrix

_{A}

_{A}

## , and

## hence in our context it may give relevant information. We have not explored this

## topic further.

**2.2. Alternation property for block-diagonal matrices.**

## It is well known

## that Chebyshev polynomials of compact sets Ω are characterized by an

*alternation*

*property*

## . For example, if Ω = [

_{a, b}

_{a, b}

## ] is a ﬁnite real interval, then

_{p}

_{p}

## (

_{z}

_{z}

## )

*∈ M*

_{m}## is the

## unique Chebyshev polynomial of degree

_{m}

_{m}

## on Ω if and only if

_{p}

_{p}

## (

_{z}

_{z}

## ) assumes its extreme

## values

*±*

## max

_{z}_{∈}_{Ω}

*|*

_{p}

_{p}

## (

_{z}

_{z}

## )

*|*

## with successively alternating signs on at least

_{m}

_{m}

## + 1 points

## (the “alternation points”) in Ω; see, e.g., [4, section 7.5]. There exist generalizations

## of this classical result to complex as well as to ﬁnite sets Ω; see, e.g., [6, Chapter 3]

## and [4, section 7.5]. The following is a generalization to block-diagonal matrices.

## Theorem 2.3.

_{Consider a block-diagonal matrix}

_{Consider a block-diagonal matrix}

_{A}

_{A}

_{= diag(}

_{A}

_{A}

_{1}

_{, . . . , A}

_{, . . . , A}

_{h}_{)}

_{, let}

_{, let}

_{k}

_{k}

*≡*

## max

_{1}

_{≤}_{j}_{≤}_{h}_{d}

_{d}

## (

_{A}

_{A}

_{j}## )

*, and let*

_{}

_{}

*be a given positive integer such that*

_{k}

_{k}

*·*

_{ < d}

_{ < d}

## (

_{A}

_{A}

## )

*. Then*

*the matrix*

_{T}

_{T}

_{k}A_{·}_{}## (

_{A}

_{A}

## ) = diag(

_{T}

_{T}

_{k}A_{·}_{}## (

_{A}

_{A}

_{1}

## )

_{, . . . , T}

_{, . . . , T}

_{k}A_{·}_{}## (

_{A}

_{A}

_{h}## ))

*has at least*

_{}

_{}

## + 1

*diagonal blocks*

*T*

_{k}A_{·}_{}## (

_{A}

_{A}

_{j}## )

*with norm equal to*

_{T}

_{T}

_{k}A_{·}_{}## (

_{A}

_{A}

## )

*.*

*Proof*

## . The assumption that

_{k}

_{k}

*·*

_{ < d}

_{ < d}

## (

_{A}

_{A}

## ) implies that

_{T}

_{T}

_{k}A_{·}_{}## (

_{z}

_{z}

## ) is uniquely deﬁned.

## For simplicity of notation we denote

_{B}

_{B}

## =

_{T}

_{T}

_{k}A_{·}_{}## (

_{A}

_{A}

## ) and

_{B}

_{B}

_{j}*≡*

_{T}

_{T}

_{k}A_{·}_{}## (

_{A}

_{A}

_{j}## ),

_{j}

_{j}

## = 1

_{, . . . , h}

_{, . . . , h}

## .

## Without loss of generality we can assume that

_{B}

_{B}

## =

_{B}

_{B}

_{1}

* ≥ · · · ≥ *

_{B}

_{B}

_{h}## .

## Suppose that the assertion is false. Then there exists an integer

_{i}

_{i}

## , 1

*≤*

_{i}

_{i}

*≤*

_{}

_{}

## ,

## so that

_{B}

_{B}

## =

_{B}

_{B}

_{1}

## =

*· · ·*

## =

_{B}

_{B}

_{i}_{>}

_{>}

_{B}

_{B}

_{i}_{+1}

## . Let

_{δ}

_{δ}

*≡ *

_{B}

_{B}

* − *

_{B}

_{B}

_{i}_{+1}

_{>}

_{>}

## 0, and let

*q*

*j*

## (

*z*

## )

*∈ M*

*k*

## be a polynomial with

*q*

*j*

## (

*A*

*j*

## ) = 0, 1

*≤*

*j*

*≤*

*h*

## . Deﬁne the polynomial

*t*

## (

_{z}

_{z}

## )

*≡*

*j*=1

*q*

*j*

## (

*z*

## )

*∈ M*

*k·*

*.*

## Let

_{}

_{}

## be a positive real number with

* <*

*δ*

*δ*

## + max

1*≤j≤h*

*t*

## (

*A*

*j*

## )

*.*

## Then 0

_{< <}

_{< <}

## 1, and hence

*r*

## (

*z*

## )

*≡*

## (1

*−*

## )

*T*

*kA·*

## (

*z*

## ) +

* t*

## (

*z*

## )

*∈ Mk*

*·*

*.*

## Note that

_{r}

_{r}

_{}## (

_{A}

_{A}

## )

## = max

_{1}

_{≤}_{j}_{≤}_{h}_{r}

_{r}

_{}## (

_{A}

_{A}

_{j}## )

## . For 1

*≤*

_{j}

_{j}

*≤*

_{i}

_{i}

## , we have

*r*

## (

*A*

*j*

## )

## = (1

*−*

## )

*B*

*j*

## = (1

*−*

## )

*B*

*<*

*B*

*.*

## For

_{i}

_{i}

## + 1

*≤*

_{j}

_{j}

*≤*

_{h}

_{h}

## , we have

*r*

## (

*A*

*j*

## )

## =

## (1

*−*

## )

*B*

*j*

## +

* t*

## (

*A*

*j*

## )

*≤*

## (1

*−*

_{}

_{}

## )

_{B}

_{B}

*j*

## +

_{}

_{}

_{t}

_{t}

## (

_{A}

_{A}

_{j}## )

*≤*

## (1

*−*

_{}

_{}

## )

_{B}

_{B}

_{i}_{+1}

## +

_{}

_{}

_{t}

_{t}

## (

_{A}

_{A}

_{j}## )

## = (1

*−*

_{}

_{}

## ) (

_{B}

_{B}

* −*

_{δ}

_{δ}

## ) +

_{}

_{}

_{t}

_{t}

## (

_{A}

_{A}

_{j}## )

## = (1

*−*

_{}

_{}

## )

_{B}

_{B}

## +

_{}

_{}

## (

_{δ}

_{δ}

## +

_{t}

_{t}

## (

_{A}

_{A}

_{j}## )

## )

*−*

_{δ.}

_{δ.}

## Since

_{}

_{}

## (

_{δ}

_{δ}

## +

_{t}

_{t}

## (

_{A}

_{A}

_{j}## )

## )

*−*

_{δ <}

_{δ <}

## 0 by the deﬁnition of

_{}

_{}

## , we see that

_{r}

_{r}

_{}## (

_{A}

_{A}

_{j}## )

_{<}

_{<}

_{B}

_{B}

## for

*i*

## + 1

*≤*

_{j}

_{j}

*≤*

_{h}

_{h}

## . But this means that

_{r}

_{r}

_{}## (

_{A}

_{A}

## )

_{<}

_{<}

_{B}

_{B}

## , which contradicts the minimality

## of the Chebyshev polynomial of

_{A}

_{A}

## .

## The numerical results shown in Table 1 illustrate this theorem. We have used

## a block-diagonal matrix

_{A}

_{A}

## with four Jordan blocks of size 3

*×*

## 3 on its diagonal, so

## that

_{k}

_{k}

## = 3. Theorem 2.3 then guarantees that

_{T}

_{T}

_{3}

*A*

_{}## (

_{A}

_{A}

## ),

_{}

_{}

## = 1

_{,}

_{,}

## 2

_{,}

_{,}

## 3, has at least

_{}

_{}

## + 1

## diagonal blocks with the same maximal norm. This is clearly conﬁrmed for

_{}

_{}

## = 1 and

ON CHEBYSHEV POLYNOMIALS OF MATRICES

## 2211

Table 1

*Numerical illustration of Theorem*2.3*: HereA*= diag(*A*1*, A*2*, A*3*, A*4)*, where eachAj*=*J _{λj}*

*is a*3

*×*3

*Jordan block. The four eigenvalues are−*3

*,−*0

*.*5

*,*0

*.*5

*,*0

*.*75

*.*

*m* *TA*
*m*(*A*1) *TmA*(*A*2) *TmA*(*A*3) *TmA*(*A*4)
1 2.6396 1.4620 2.3970 2.6396
2 4.1555 4.1555 3.6828 4.1555
3 9.0629 5.6303 7.6858 9.0629
4 14.0251 14.0251 11.8397 14.0251
5 22.3872 20.7801 17.6382 22.3872
6 22.6857 22.6857 20.3948 22.6857

## = 2 (it also holds for

_{}

_{}

## = 3). For these

_{}

_{}

## we observe that

*exactly*

_{}

_{}

## + 1 diagonal

## blocks achieve the maximal norm. Hence in general the lower bound of

_{}

_{}

## + 1 blocks

## attaining the maximal norm in step

_{m}

_{m}

## =

_{k}

_{k}

*·*

_{}

_{}

## cannot be improved. In addition, we see

## in this experiment that the number of diagonal blocks with the same maximal norm

## is not necessarily a monotonically increasing function of the degree of the Chebyshev

## polynomial.

## Now consider the matrix

*A*

## = diag(

_{A}

_{A}

_{1}

_{, A}

_{, A}

_{2}

## ) =

## ⎡

## ⎢

## ⎢

## ⎣

## 1

## 1

## 1

*−*

## 1

## 1

*−*

## 1

## ⎤

## ⎥

## ⎥

## ⎦

*.*

## Then for

_{p}

_{p}

## (

_{z}

_{z}

## ) =

_{z}

2_{z}

*−*

_{αz}

_{αz}

*−*

_{β}

_{β}

*∈ M*

_{2}

## we get

*p*

## (

_{A}

_{A}

## ) =

## ⎡

## ⎢

## ⎢

## ⎣

## 1

*−*

## (

_{α}

_{α}

## +

_{β}

_{β}

## )

## 2

*−*

_{α}

_{α}

## 1

*−*

## (

_{α}

_{α}

## +

_{β}

_{β}

## )

## 1

*−*

## (

_{α}

_{α}

## +

_{β}

_{β}

## )

*−*

## 2

*−*

_{α}

_{α}

## 1

*−*

## (

_{α}

_{α}

## +

_{β}

_{β}

## )

## ⎤

## ⎥

## ⎥

## ⎦

*.*

## For

_{α}

_{α}

## = 0 and

*any*

_{β}

_{β}

*∈*

## R

## we will have

_{p}

_{p}

## (

_{A}

_{A}

## )

## =

_{p}

_{p}

## (

_{A}

_{A}

_{1}

## )

## =

_{p}

_{p}

## (

_{A}

_{A}

_{2}

## )

## . Hence there are

## inﬁnitely many polynomials

_{p}

_{p}

*∈ M*

_{2}

## for which the two diagonal blocks have the same

## maximal norm. One of these polynomials is the Chebyshev polynomial

_{T}

_{T}

_{2}

*A*

## (

_{z}

_{z}

## ) =

_{z}

2_{z}

## +1.

## As shown by this example, the condition in Theorem 2.3 on a polynomial

_{p}

_{p}

*∈ M*

_{k}_{·}_{}## that at least

_{}

_{}

## + 1 diagonal blocks of

_{p}

_{p}

## (

_{A}

_{A}

## ) have equal maximal norm is in general

## necessary but

*not*

## suﬃcient so that

_{p}

_{p}

## (

_{z}

_{z}

## ) =

_{T}

_{T}

_{k}A_{·}_{}## (

_{z}

_{z}

## ).

## Finally, as a special case of Theorem 2.3 consider a matrix

_{A}

_{A}

## = diag(

_{λ}

_{λ}

_{1}

_{, . . . , λ}

_{, . . . , λ}

_{n}## )

## with distinct diagonal elements

_{λ}

_{λ}

_{1}

_{, . . . , λ}

_{, . . . , λ}

_{n}*∈*

## C

## . If

_{m < n}

_{m < n}

## , then there are at least

*m*

## + 1 diagonal elements

_{λ}

_{λ}

_{j}## with

*|*

_{T}

_{T}

_{m}A## (

_{λ}

_{λ}

_{j}## )

*|*

## =

_{T}

_{T}

_{m}A## (

_{A}

_{A}

## )

## = max

_{1}

_{≤}_{i}_{≤}_{n}*|*

_{T}

_{T}

_{m}A## (

_{λ}

_{λ}

_{i}## )

*|*

## . Note

## that

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) in this case is equal to the

_{m}

_{m}

## th Chebyshev polynomial of the ﬁnite set

*{*

*λ*

1*, . . . , λ*

*n*

*} ⊂*

## C

## . This shows that the Chebyshev polynomial of degree

*m*

## of a set in

## the complex plane consisting of

_{n}

_{n}

*≥*

_{m}

_{m}

## + 1 points attains its maximum value at least

## at

_{m}

_{m}

## + 1 points.

**2.3. Chebyshev polynomials with known zero coeﬃcients.**

## In this section

## we study properties of a matrix

_{A}

_{A}

## so that its Chebyshev polynomials have known

## zero coeﬃcients. An extreme case in this respect is when

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) =

_{z}

_{z}

*m*

## , i.e., when all

## coeﬃcients of

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ), except the leading one, are zero. This happens if and only if

*A*

*m*

## =

## min

*p∈Mm*

*p*

## (

*A*

## )

*.*

## Equivalently, this says that the zero matrix is the best approximation of

_{A}

_{A}

*m*

## from the

## linear space span

*{*

_{I, A, . . . , A}

_{I, A, . . . , A}

*m−*1

*}*

## (with respect to the matrix 2-norm). To

## charac-terize this property, we recall that the dual norm to the matrix 2-norm

* · *

## is the

## trace norm (also called energy norm or

_{c}

_{c}

_{1}

## -norm),

## (2.7)

*|||*

_{M}

_{M}

*||| ≡*

*r*

*j*=1

*σ*

*j*

## (

_{M}

_{M}

## )

_{,}

_{,}

## where

_{σ}

_{σ}

_{1}

## (

_{M}

_{M}

## )

_{, . . . , σ}

_{, . . . , σ}

_{r}## (

_{M}

_{M}

## ) denote the singular values of the matrix

_{M}

_{M}

*∈*

## C

*n×n*

## with

## rank(

_{M}

_{M}

## ) =

_{r}

_{r}

## . For

_{X}

_{X}

*∈*

## C

*n×n*

## and

_{Y}

_{Y}

*∈*

## C

*n×n*

## we deﬁne the inner product

_{X, Y}

_{X, Y}

* ≡*

## trace(

_{Y}

_{Y}

*∗*

_{X}

_{X}

## ). We can now adapt a result of Zi¸etak [27, p. 173] to our context and

## notation.

## Theorem 2.4.

*Let*

*A*

*∈*

## C

*n×n*

*and a positive integer*

*m < d*

## (

*A*

## )

*be given. Then*

*T*

*mA*

## (

*z*

## ) =

*z*

*m*

*if and only if there exists a matrix*

_{Z}

_{Z}

*∈*

## C

*n×n*

*with*

*|||*

_{Z}

_{Z}

*|||*

## = 1

*such that*

## (2.8)

_{Z, A}

_{Z, A}

*k*

## = 0

_{, k}

_{, k}

## = 0

_{, . . . , m}

_{, . . . , m}

*−*

## 1

_{,}

_{,}

*and*

## Re

_{Z, A}

_{Z, A}

*m*

## =

_{A}

_{A}

*m*

_{.}

_{.}

## As shown in [13, Theorem 3.4], the

_{m}

_{m}

## th Chebyshev polynomial of a Jordan block

*J*

*λ*

## with eigenvalue

_{λ}

_{λ}

*∈*

## C

## is given by (

_{z}

_{z}

*−*

_{λ}

_{λ}

## )

*m*

## . In particular, the

_{m}

_{m}

## th Chebyshev

## polynomial of

_{J}

_{J}

_{0}

## is of the form

_{z}

_{z}

*m*

## . A more general class of matrices with

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) =

_{z}

_{z}

*m*

## is studied in section 3.1 below.

## It is well known that the Chebyshev polynomials of real intervals that are

## sym-metric with respect to the origin are alternating between even and odd, i.e., every

## second coeﬃcient (starting from the highest one) of

_{T}

_{T}

*[*

_{m}*−a,a*]

## (

_{z}

_{z}

## ) is zero, which means

## that

_{T}

_{T}

*[*

_{m}*−a,a*]

## (

_{z}

_{z}

## ) = (

*−*

## 1)

*m*

_{T}

_{T}

*[*

_{m}*−a,a*]

## (

*−*

_{z}

_{z}

## ). We next give an analogue of this result for

## Chebyshev polynomials of matrices.

## Theorem 2.5.

_{Let}

_{Let}

_{A}

_{A}

*∈*

## C

*n×n*

_{and a positive integer}

_{and a positive integer}

_{m < d}

_{m < d}

_{(}

_{A}

_{A}

_{)}

_{be given. If there}

_{be given. If there}

*exists a unitary matrix*

_{P}

_{P}

*such that either*

_{P}

_{P}

*∗*

_{AP}

_{AP}

## =

*−*

_{A}

_{A}

*, or*

_{P}

_{P}

*∗*

_{AP}

_{AP}

## =

*−*

_{A}

_{A}

*T*

*, then*

## (2.9)

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) = (

*−*

## 1)

*m*

_{T}

_{T}

_{m}A## (

*−*

_{z}

_{z}

## )

_{.}

_{.}

*Proof*

## . We prove the assertion only in case

_{P}

_{P}

*∗*

_{AP}

_{AP}

## =

*−*

_{A}

_{A}

## ; the other case is similar.

## If this relation holds for a unitary matrix

_{P}

_{P}

## , then

## (

*−*

## 1)

*m*

_{T}

_{T}

_{m}A## (

*−*

_{A}

_{A}

## )

## =

_{T}

_{T}

_{m}A## (

_{P}

_{P}

*∗*

_{AP}

_{AP}

## )

## =

_{P}

_{P}

*∗*

_{T}

_{T}

_{m}A## (

_{A}

_{A}

## )

_{P}

_{P}

## =

_{T}

_{T}

_{m}A## (

_{A}

_{A}

## )

## = min

*p∈Mm*

*p*

## (

*A*

## )

*,*

## and the result follows from the uniqueness of the

_{m}

_{m}

## th Chebyshev polynomial of

*A*

## .

## As a special case consider a normal matrix

_{A}

_{A}

## and its unitary diagonalization

*U*

*∗*

*AU*

## =

_{D}

_{D}

## . Then

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) =

_{T}

_{T}

_{m}D## (

_{z}

_{z}

## ), so we may consider only the Chebyshev

## polyno-mial of the diagonal matrix

_{D}

_{D}

## . Since

_{D}

_{D}

## =

_{D}

_{D}

*T*

## , the conditions in Theorem 2.5 are

## satisﬁed if and only if there exists a unitary matrix

_{P}

_{P}

## such that

_{P}

_{P}

*∗*

_{DP}

_{DP}

## =

*−*

_{D}

_{D}

## . But

## this means that the set of the diagonal elements of

_{D}

_{D}

## (i.e., the eigenvalues of

_{A}

_{A}

## ) must

## be symmetric with respect to the origin (i.e., if

_{λ}

_{λ}

_{j}## is an eigenvalue,

*−*

_{λ}

_{λ}

_{j}## is an

## eigen-value as well). Therefore, whenever a discrete set Ω

*⊂*

## C

## is symmetric with respect

## to the origin, the Chebyshev polynomial

_{T}

_{T}

*Ω*

_{m}## (

_{z}

_{z}

## ) is even (odd) if

_{m}

_{m}

## is even (odd).

ON CHEBYSHEV POLYNOMIALS OF MATRICES

## 2213

## As an example of a nonnormal matrix, consider

*A*

## =

## ⎡

## ⎢

## ⎢

## ⎣

## 1

_{}

_{}

*−*

## 1

## 1

_{/}

_{/}

## 1

_{}

_{}

*−*

## 1

## ⎤

## ⎥

## ⎥

## ⎦

*,*

* >*

## 0

_{,}

_{,}

## which has been used by Toh [22] in his analysis of the convergence of the GMRES

## method. He has shown that

_{P}

_{P}

*T*

_{AP}

_{AP}

## =

*−*

_{A}

_{A}

## , where

*P*

## =

## ⎡

## ⎢

## ⎢

## ⎣

## 1

*−*

## 1

## 1

*−*

## 1

## ⎤

## ⎥

## ⎥

## ⎦

## is an orthogonal matrix.

## For another example consider

## (2.10)

_{C}

_{C}

## =

*J*

*λ*

*J*

*−λ*

*,*

*J*

*λ*

*, J*

*−λ*

*∈*

## C

*n×n*

*,*

*λ*

*∈*

## C

*.*

## It is easily seen that

## (2.11)

_{J}

_{J}

_{−}_{λ}## =

*−*

_{I}

_{I}

*±*

_{J}

_{J}

_{λ}_{I}

_{I}

*±*

_{,}

_{,}

## where

_{I}

_{I}

*±*

## = diag(1

_{,}

_{,}

*−*

## 1

_{,}

_{,}

## 1

_{, . . . ,}

_{, . . . ,}

## (

*−*

## 1)

*n−*1

## )

*∈*

## R

*n×n*

_{.}

_{.}

## Using the symmetric and orthogonal matrices

*P*

## =

*I*

*I*

*,*

*Q*

## =

*I*

*±*

*I*

*±*

*,*

## we receive

_{QP CP Q}

_{QP CP Q}

## =

*−*

_{C}

_{C}

## .

## The identity (2.11) implies that

*T*

*mC*

## (

*J*

*−λ*

## )

## =

_{T}

_{T}

_{m}C## (

*−*

_{I}

_{I}

*±*

_{J}

_{J}

_{λ}_{I}

_{I}

*±*

## )

## =

_{T}

_{T}

_{m}C## (

_{J}

_{J}

_{λ}## )

_{,}

_{,}

## i.e., the Chebyshev polynomials of

_{C}

_{C}

## attain the same norm on each of the two diagonal

## blocks. In general, we can shift and rotate any matrix consisting of two Jordan blocks

## of the same size and with respective eigenvalues

_{λ, μ}

_{λ, μ}

*∈*

## C

## into the form (2.10). It

## then can be shown that the Chebyshev polynomials

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) of

_{A}

_{A}

## = diag(

_{J}

_{J}

_{λ}_{, J}

_{, J}

_{μ}## ) satisfy

## the “norm balancing property”

_{T}

_{T}

_{m}A## (

_{J}

_{J}

_{λ}## )

## =

_{T}

_{T}

_{m}A## (

_{J}

_{J}

_{μ}## )

## . The proof of this property is

## rather technical and we skip it for brevity.

**2.4. Linear Chebyshev polynomials.**

## In this section we consider the linear

## Chebyshev problem

## min

*α∈*C

*A*

*−*

*αI*

*.*

## Work related to this problem has been done by Friedland [8], who characterized

## solu-tions of the problem min

_{α}_{∈}_{R}

_{A}

_{A}

*−*

_{αB}

_{αB}

## , where

_{A}

_{A}

## and

_{B}

_{B}

## are two complex, and possibly

## rectangular matrices. This problem in general does not have a unique solution. More

## recently, Afanasjew et al. [1] have studied the restarted Arnoldi method with restart

## length equal to 1. The analysis of this method involves approximation problems of

## the form min

_{α}_{∈}_{C}

## (

_{A}

_{A}

*−*

_{αI}

_{αI}

## )

_{v}

_{v}

_{1}

## (cf. (1.2)), whose unique solution is

_{α}

_{α}

## =

_{v}

_{v}

_{1}

*∗*

_{Av}

_{Av}

_{1}

## .

## Theorem 2.6.

*Let*

*A*

*∈*

## C

*n×n*

*be any (nonzero) matrix, and denote by*

## Σ(

*A*

## )

*the*

*span of the right singular vectors of*

_{A}

_{A}

*corresponding to the maximal singular value*

*of*

_{A}

_{A}

*. Then*

_{T}

_{T}

_{1}

*A*

## (

_{z}

_{z}

## ) =

_{z}

_{z}

*if and only if there exists a vector*

_{w}

_{w}

*∈*

## Σ(

_{A}

_{A}

## )

*with*

_{w}

_{w}

*∗*

_{Aw}

_{Aw}

## = 0

*.*

*Proof*

## . If

_{T}

_{T}

_{1}

*A*

## (

_{z}

_{z}

## ) =

_{z}

_{z}

## , then

_{A}

_{A}

## = min

_{α}_{∈}_{C}

_{A}

_{A}

*−*

_{αI}

_{αI}

## . According to a result of

## Greenbaum and Gurvits [10, Theorem 2.5], there exists a unit norm vector

_{w}

_{w}

*∈*

## C

*n*

## ,

## so that

2## min

*α∈*C

*A*

*−*

*αI*

## = min

*α∈*C

## (

*A*

*−*

*αI*

## )

*w*

*.*

## The unique solution of the problem on the right-hand side is

_{α}

_{α}

_{∗}## =

_{w}

_{w}

*∗*

_{Aw}

_{Aw}

## .

## Our

## assumption now implies that

_{w}

_{w}

*∗*

_{Aw}

_{Aw}

## = 0, and the equations above yield

_{A}

_{A}

## =

_{Aw}

_{Aw}

## ,

## which shows that

_{w}

_{w}

*∈*

## Σ(

_{A}

_{A}

## ).

## On the other hand, suppose that there exists a vector

_{w}

_{w}

*∈*

## Σ(

_{A}

_{A}

## ) such that

_{w}

_{w}

*∗*

_{Aw}

_{Aw}

## =

## 0. Without loss of generality we can assume that

_{w}

_{w}

## = 1. Then

*A*

* ≥*

## min

*α∈*C

*A*

*−*

*αI*

* ≥*

## min

*α∈*C

*Aw*

*−*

*αw*

## = min

*α∈*C

## (

*Aw*

## +

*αw*

## ) =

*Aw*

*.*

## In the ﬁrst equality we have used that

_{w}

_{w}

*∗*

_{Aw}

_{Aw}

## = 0, i.e., that the vectors

_{w}

_{w}

## and

_{Aw}

_{Aw}

## are

## orthogonal. The assumption

_{w}

_{w}

*∈*

## Σ(

_{A}

_{A}

## ) implies that

_{Aw}

_{Aw}

## =

_{A}

_{A}

## , and thus equality

## must hold throughout the above relations. In particular,

_{A}

_{A}

## = min

_{α}_{∈}_{C}

_{A}

_{A}

*−*

_{αI}

_{αI}

## ,

## and hence

_{T}

_{T}

_{1}

*A*

## (

_{z}

_{z}

## ) =

_{z}

_{z}

## follows from the uniqueness of the solution.

## An immediate consequence of this result is that if zero is outside the ﬁeld of values

## of

_{A}

_{A}

## , then

_{T}

_{T}

_{1}

*A*

## (

_{A}

_{A}

## )

_{<}

_{<}

_{A}

_{A}

## . Note that this also follows from [21, Theorem 5.10], which

## states that the zeros of

_{T}

_{T}

_{m}A## (

_{z}

_{z}

## ) are contained in the ﬁeld of values of

_{A}

_{A}

## .

## We will now study the relation between Theorem 2.4 for

_{m}

_{m}

## = 1 and Theorem 2.6.

## Let

_{w}

_{w}

*∈*

## Σ(

_{A}

_{A}

## ) and let

_{u}

_{u}

*∈*

## C

*n*

## be a corresponding left singular vector, so that

_{Aw}

_{Aw}

## =

*A*

*u*

## and

_{A}

_{A}

*∗*

_{u}

_{u}

## =

_{A}

_{A}

_{w}

_{w}

## . Then the condition

_{w}

_{w}

*∗*

_{Aw}

_{Aw}

## = 0 in Theorem 2.6 implies that

*w*

*∗*

*u*

## = 0. We may assume that

_{w}

_{w}

## =

_{u}

_{u}

## = 1. Then the rank-one matrix

_{Z}

_{Z}

*≡*

_{uw}

_{uw}

*∗*

## satisﬁes

*|||*

_{Z}

_{Z}

*|||*

## = 1,

## 0 =

_{w}

_{w}

*∗*

_{u}

_{u}

## =

*n*

*i*=1

*w*

*i*

*u*

*i*

## = trace(

*Z*

## ) =

*Z, I*

## =

*Z, A*

0*,*

## and

*Z, A*

## = trace(

_{A}

_{A}

*∗*

_{uw}

_{uw}

*∗*

## ) =

_{A}

_{A}

## trace(

_{ww}

_{ww}

*∗*

## ) =

_{A}

_{A}

*n*

*i*=1

*w*

*i*

*w*

*i*

## =

_{A}

_{A}

_{.}

_{.}

## Hence Theorem 2.6 shows that

_{T}

_{T}

_{1}

*A*

## (

_{z}

_{z}

## ) =

_{z}

_{z}

## if and only if there exists a rank-one matrix

*Z*

## satisfying the conditions (2.8).

## Above we have already mentioned that

_{T}

_{T}

_{1}

*A*

## (

_{z}

_{z}

## ) =

_{z}

_{z}

## holds when

_{A}

_{A}

## is a Jordan

## block with eigenvalue zero. It is easily seen that, in the notation of Theorem 2.6,

## the vector

_{w}

_{w}

## in this case is given by the last canonical basis vector. Furthermore,

*T*

_{1}

*A*

## (

_{z}

_{z}

## ) =

_{z}

_{z}

## holds for any matrix

_{A}

_{A}

## that satisﬁes the conditions of Theorem 2.5, i.e.,

*P*

*∗*

*AP*

## =

*−*

_{A}

_{A}

## or

_{P}

_{P}

*∗*

_{AP}

_{AP}

## =

*−*

_{A}

_{A}

*T*

## for some unitary matrix

_{P}

_{P}

## .

2_{Greenbaum and Gurvits have stated this result for real matrices only, but since its proof mainly}

involves singular value decompositions of matrices, a generalization to the complex case is straight-forward.