• No results found

Regression With Gaussian Measures

N/A
N/A
Protected

Academic year: 2021

Share "Regression With Gaussian Measures"

Copied!
125
0
0

Loading.... (view fulltext now)

Full text

(1)

Regression With Gaussian Measures

Michael J. Meyer Copyright c April 11, 2004

(2)

PREFACE

We treat the basics of Gaussian processes, Gaussian measures, kernel re-producing Hilbert spaces and related topics. All mathematical details are included and every effort is made to keep this as selfcontained as possible. Only elementary Hilbert space theory and integration theory as well as basic results from probability theory are assumed.

This is a work in progress and has been written up in haste. Undoubtedly there are mistakes. Please email me at [email protected] if you find mistakes or have suggestions.

Michael J. Meyer April 11, 2004

(3)

Contents

1 Introduction 1

2 Operators on Hilbert Space 5

2.1 Hilbert space basics . . . 5

2.2 Adjoint operator . . . 7

2.3 Selfadjoint and positive operators . . . 8

2.4 Compact operators between Banach spaces . . . 10

2.5 Compact selfadjoint operators . . . 14

2.6 Compact operators between Hilbert spaces . . . 17

2.7 Hilbert-Schmidt and trace class operators . . . 20

2.8 Inverse problems and regularization . . . 25

2.8.1 Regularization . . . 28

2.9 Kernels and integral operators . . . 29

2.10 Symmetric kernels . . . 34

2.11 L2-Bounded Kernels . . . 36

3 Reproducing Kernel Hilbert Spaces 37 3.1 Positive semidefinite kernels . . . 37

3.2 Translation invariant kernels . . . 39

3.3 Reproducing kernel Hilbert spaces . . . 41

3.4 Bilinear kernel expansion . . . 49

3.5 Characterization of functions in HK . . . 52

3.6 Kernel domination . . . 55

3.7 Approximation in reproducing kernel Hilbert spaces . . . 61

3.8 Orthonormal bases . . . 62

3.8.1 Second description ofH . . . 66

4 Gaussian Measures 69 4.1 Probability measures in Hilbert space . . . 69

(4)

4.2 Gaussian measures on Hilbert space . . . 73

4.3 Cameron-Martin space . . . 79

4.4 Regression with Gaussian measures . . . 81

4.4.1 Model choices . . . 86

5 Square Integrable Processes 89 5.1 Integrable processes . . . 89

5.2 Processes with sample paths in an RKHS . . . 90

6 Gaussian random fields 101 6.1 Definition and construction . . . 101

6.1.1 Construction of Gaussian random fields . . . 103

A Vector Valued Integration 109 B Conditioning of multinormal Random Vectors 111 C Orthogonal polynomials 113 C.0.2 Legendre polynomials . . . 114

(5)

Chapter 1

Introduction

We will freely use the terminology which will be defined later. Let F be a nonempty set and f : F R a real valued function on F. Consider

the following problem: we have observed the value of f at some points

x1, . . . , xn∈F as

yj =f(xj), j = 1, . . . , n, (1.1)

and from this we want to estimate f itself. We will follow a Bayesian ap-proach. It is assumed that the function f belongs to a real vector space H of functions on F. A prior probability P is placed onH and the regressor

ˆ

f (the estimate of f in light of the data) is computed as the mean of P

conditioned on the data (1.1).

The probabilityP is defined on theσ-fieldEgenerated by the continuous

linear functionals on H. IfI : (H,E, P) H denotes the H-valued random

variable defined as I(f) = f (the identity on H) then the mean of the distribution P on H is the expectation EP[I] of I under P, that is, the

H-valued integral EP[I] = Z HIdP = Z Hf P(df), (1.2)

Do not worry if this sounds needlessly abstract since it is not how things are handled in practice. It merely serves to motivate the procedures below. The vector valued integral (1.2) commutes with all continuous linear functionals Λ on H, that is,

Λ EP[I] =EPI) =

Z

HΛ(f)P(df)

and the same holds true if the ordinary expectation is replaced with a con-1

(6)

ditional expectation. The regressor ˆf is the conditional expectation ˆ

f =EP[I|data] (1.3)

and so we have

Λ( ˆf) =EP|data] (1.4)

for each continuous linear functional Λ on H. (note that ΛI = Λ). Thus

rather than computing the regressor ˆf globally as in (1.3) we compute Λ(f) for enough continuous linear functionals Λ onHto obtain a good view of ˆf. For each x∈F let

Ex:f H7→f(x)R

denote the valuation functional at the pointx. If Λ =Ex then Λ( ˆf) = ˆf(x)

is our prediction for the value off at the pointx in light of the data (1.1). Note that the data themselves can be written in terms of the valuation functionals as

Ej(f) =yj, 1≤j≤n, (1.5)

whereEj =Exj is the evaluation functional at the pointxj. With this the

regressor ˆf becomes the condional expectation ˆ

f =EP[I |Ej =yj, jn]

and

Λ( ˆf) =EP|Ej =yj, j n], (1.6)

for each continuous linear functional Λ onH. To make this feasible we have to assume that

1. The evaluation functionalsEx,x∈F, are continuous onH.

The computation of (1.6) involves only the finite dimensional distribution of the random vector

W = (E1, . . . , En,Λ)

on Rn+1 under the probability P. Note that each continuous linear func-tional onHis a random variable on the probability space (H,E, P).

The measureP is is called aGaussian measure onHif every continuous linear functional Λ onHis a normal random variable under P. In this case the distribution of the vector W is automatically Gaussian (multinormal) on Rn+1 and the computation of the conditional epxectation (1.6) involves merely routine computations with the multinormal density.

(7)

3 We have chosen the particular form (1.1) for the data because this is the standard in regression problems. Note however that our approach applies to all forms of data and predictions which can be articulated in terms of events involving finitely many continuous linear functionals on H.

Regression with Gaussian processes assumes that f is the trajectory of a Gaussian process Z = Z(x) on F. The mean of the process is assumed to be zero and thus the process Z completely determined by its covariance functionK(x, y) which is a symmetric positive semidefinite kernel on F.

The kernel K :F ×F R is a parameter of the regression procedure.

The space H is the product space H=RF of all functions f :F R and

the probability P is the distribution of Z on H. Kolmogoroff’s existence theorem for product measures guarentees the existence of the probability P

on Hfor every symmetric, positive semidefinite kernel K on F.

The spaceH=RF is a topological vector space with only one redeeming

quality: the evaluation functionals are the coordinate functionals and hence continuous in the product topololgy on H.

Unfortunately there are essentially no other continuous linear functionals on H. Every continuous linear functional onHis a finite linear combination of coordinate functionals.

Consequently this setup limits us to data presented in the form (1.1) and consequent predictions of values f(x) at other points x F in a point by

point fashion.

There are other disadvantages. For example it requires a substantial effort to extract properties of the admissible functionsf, that is, the trajec-tories of the Gaussian processZ, from properties of the covariance kernelK

and the resulting properties are often weaker than desired.

Consequently we take a slightly different approach. We assume instead thatf is an element of a separable Hilbert spaceHof functions onF. P is a Gaussian measure onHdefined in terms of an orthonormal basis {ψj}of H

and a sequence (σj) of positive numbers (which diagonalize the covariance

operatorQ ofP below).

We can then proceed as above provided that the evaluation functionals are continuous on H. But we also have other options. The data and pre-dictions can be articulated in any fashion which uses only finitely many continuous linear functionals Λ on H. Point estimates are one possibility. Another possibility are the coefficients

Λ(f) = (f, ψk)

(8)

Here we had to assume that the evaluation functionals are continuous on H. A Hilbert space of functions on F with this property is called a

reproducing kernel Hilbert space on F. Such a Hilbert space H defines a unique symmetric, positive semidefinitekernelK :F R. Conversely

every symmetric, positive semidefinite kernel onK:F Rdetermines

a unique reproducing kernel Hilbert space. There is an interesting interplay between orthonormal bases ofHand the kernelK.

A basic question is how to find an orthonormal basis forH. IfF Rd is

compact andK is continuous, then we have additional structure in the form of the Euclidean topology and Lebesgue measure onX. Associated with the kernelK we have the integral operator T :L2(F)L2(F) defined by

(T f)(x) =

Z

F

K(x, y)f(y)dy, f L2(F), xF,

wheredxdenotes Lebesgue measure on F. It turns out thatT is a Hilbert-Schmidt operator. Consequently the orthogonal complement of the null space of T has an orthonormal basis {φj} consisting of eigenvectors of T.

Letλj denote the corresponding eigenvalues. Then the functions

ψj =

p

λjφj

are an orthonormal basis for the reproducing kernel Hilbert space H with kernelK. This establishes the connection to the spectral theory of compact, selfadjoint operators on a Hilbert space.

There is another connection. For f H let Λf be the bounded linear

functional Λf(h) = (h, f) on H. The Gaussian measure P on H defines a

unique bounded linear operatorQ:HHsuch that the covariances of the

random variables Λf, Λg are given as

CovPf,Λg) = (Qf, g)H, f, g∈H. (1.7)

The operator Q is a positive trace class operator. Conversely for every positive trace class operator Q : H H, there exists a unique Gaussian

measureP onHsuch that (1.7) holds.

Thus the material presents an interesting interaction of functional anal-ysis and probability theory. If you are only interested in the regression prob-lem you need only read Chapter 2, Chapter 3, sections 1-4,7,8 and Chapter 4, sections 1,2,4.

(9)

Chapter 2

Operators on Hilbert Space

In this chapter we develop the spectral theory of compact operators between Hilbert spaces. Our scalars are the reals, that is, weconsider only real Hilbert spaces.

2.1

Hilbert space basics

We review the basics of Hilbert space theory. LetHbe a (real) Hilbert space with inner product (·,·). Let

H1 ={x∈H : kxk ≤1} denote the closed unit ball in Hand

S1(H) ={x∈H : kxk= 1}

the unit sphere in H. For vectors x, y H we write x y (orthogonal) if

(x, y) = 0. For subsetsA,B of Hwe writeA⊥B ifabfor all aA and

b∈B. We let

A⊥:={xH|xa, for allaA}.

Then A⊥ is a closed subspace of H. IfV is aclosed subspace ofH, then

H=V +V⊥,

in particular every closed subspace of H is complemented in H. This is the first fundamental fact about Hilbert spaces. Each element x∈Hhas unique

decomposition x=v+v⊥ with v∈V and v⊥∈V. We have kxk2 =kvk2+v2

(10)

(the Law of Pythagoras). The map x 7→ v is called the perpendicular

pro-jection onto the subspaceV and is denotedπV. If (φj) is an ON-basis ofV,

then

πV(x) =

X

j(x, φj)φj, x∈H. (2.1)

The second fundamental property of a Hilbert space His the fact that the continuous linear functionals onHcan be identified with the elements ofH ifa∈H, then

Λa:x∈H7→(x, a)R

defines a continuous linear functional onH. The converse is also true: ev-ery continuous linear functional on H has this form (Riesz Representation Theorem).

Bilinear forms. Let X and Y be Hilbert spaces. A function ψ=ψ(x, y) :

Y Ris called abilinear form if it is linear in both variablesx andy.

The bilinear formψ is called continuousif

kψk= sup{ |ψ(x, y)| : xX1, yY1}<. (2.2)

In this case|ψ(x, y)| ≤ kψk kxk kyk, for allxX andyY. Note that the

closed unit ballsX1,Y1 can be replaced with the unit spheresS1(X),S1(Y) with no effect on the definition of the norm ofψ.

If A : X Y is a bounded linear operator, then ψ(x, y) = (Ax, y)

defines a continuous bilinear form onY withkψk=kAk. Conversely

Theorem 2.1.1 (Lax-Milgram). Letψ=ψ(x, y) be a continuous bilinear form on X ×Y. Then there exists a bounded linear operator A : X Y

such thatψ(x, y) = (Ax, y)Y, for all x∈X and y∈Y.

Proof.Fixx∈X. Then Λx(y) =ψ(x, y) is a continuous linear functional on

Y. By the Riesz Representation Theorem there exists an elementa∈Y with

Λx(y) = (a, y)Y, for ally∈Y. Clearlyais uniquely determined byx. Write

a=Ax. Thus defines a mapA:X→Y which satisfies ψ(x, y) = (Ax, y).

The uniqueness ofaand linearity of ψ in the first argument imply that the mapA is linear. The continuity of ψimplies thatA is continuous.

If X =Y =H, then a bilinear form ψ = ψ(x, y) on Y is called a

bilinear form on H. Such a bilinear from is called symmetric if it satisfies

ψ(x, y) =ψ(y, x), for all x, y∈H. In this case

Proposition 2.1.1. Let ψ = ψ(x, y) be a symmetric bilinear form on H. Then

(11)

2.2. ADJOINT OPERATOR 7

Proof. Let C denote the right hand side of (2.3). Obviously C ≤ kψk and

we have to show only the reverse inequality. Write φ(x) = ψ(x, x). Then

|φ(x)| ≤C ifkxk ≤1. Using the the symmetry ofψ we can write

ψ(x, y) = 1 2 φ x+y 2 φ x−y 2 .

Recall thatH1 deotes the closed unit ball inH. Ifx, y∈H1, then (x±y)/2

H1 and it follows that((x±y)/2)| ≤C. From this

ψ(x, y) 1

2(C+C) =C. Taking the sup over all x, y∈H1 now yields kψk ≤C.

2.2

Adjoint operator

The Lax-Milgram theorem can be used show the existence of the adjoint operator. Let X, Y be Hilbert spaces and T : X Y a bounded linear

operator. Then ψ(y, x) = (y, T x)Y is a continuous bilinear form on Y ×X.

Consequently there exists a bounded linear operatorT∗ :Y X such that

ψ(y, x) = (T∗y, x)X, for all y Y and x X. It is easy o see that the

operatorT∗ is uniquely determined by its defining property

(T x, y) = (x, T∗y), x∈X, yY.

ObviouslyT∗∗=T. We note the following

Proposition 2.2.1. We have (i) N(T∗T) =N(T).

(ii) N(T∗) =R(T)⊥. (iii) N(T) =R(T∗).

Proof. (i) If T x = 0, then T∗T x = 0. Conversely, if T∗T x = 0, then

kT xk2 = (T x, T x) = (TT x, x) = 0, thusxN(T).

(ii) Letw∈N(T) andy=T xfor somexX. Then (y, w) = (x, Tw) = 0.

Thus w∈R(T).

Conversely, if w∈ R(T), then (Tw, x) = (y, T x) = 0, for all x X.

This implies T∗y = 0 (let x = T∗y) and so w N(T). Now (iii) follows

from this. ReplaceT withT∗ and note that T∗∗=T.

Remark. By taking orthogonal complements in (ii) and (iii) we obtain

R(T) N(T) and R(T) N(T) but we will not have equality in

(12)

For any subset A X we have A = (A). Thus (ii) can be written

as N(T∗) = [R(T)]. Note that this implies that T∗ is one to one on the closureR(T) of the range of T.

2.3

Selfadjoint and positive operators

A bounded linear operatorT on His called selfadjointif it satisfies

(T x, y) = (x, T y), (2.4) for all x, y H. In this case the nullspace N(T) = {x H | T x = 0}

satisfies

N(T) =R(T)⊥.

The converse R(T) =N(T) is not true in general simply because the

rangeR(T) will not in general be closed.

The number λ is called an eigenvalue of T if there is a nonzero vector

x∈HwithT x=λx, that isxN(TλI), whereI is the identity operator

onH. We let

(T) :=N(T−λI) ={x∈H|T x=λx}

denote theeigenspaceassociated with the eigenvalueλ. Obviously this space is defined wether or notλis an eigenvalue ofT. It is an eigenvalue if and only if(T)6={0}. The nonzero elements of(T) are called the eigenvectors

associated with the eigenvalueλ.

Proposition 2.3.1. Let T be a selfadjoint operator on H. Then λ 6= µ

impliesEλ(T)⊥Eµ(T), in other words, eigenvectors with respect to different

eigenvalues are perpendicular to each other.

Proof. Assume that T x = λx and T y = µy. Then λ(x, y) = (T x, y) = (x, T y) =µ(x, y). Since λ6=µthis implies that (x, y) = 0.

If λ = 0 then the eigenspace (T) is simply the nullspace N(T) and

λ= 0 is an eigenvalue ofT if and only ifT has an nontrivial nullspace. IfT

is selfadjoint this eigenspace is perpendicular to the rangeR(T) and so no eigenvector associated with the eigenvalue zero is in the range ofT.

By contrast, if λ6= 0, then Eλ(T) R(T) since every eigenvector

asso-ciated withλsatisfiesx=λ−1T x.

A subspaceV Xis called T-invariantif it satisfiesT(V)V. In this

(13)

2.3. SELFADJOINT AND POSITIVE OPERATORS 9

Proposition 2.3.2. Let T be a selfadjoint operator on Hand V Ha T

-invariant subspace. Then the orthogonal complementV⊥is alsoT-invariant. Proof. Letx∈V. Then for ally V we have (T x, y) = (x, T y) = 0, since

T y∈V. ThusT xV.

Assume thatV is a closedT invariant subspace, write H=V +V⊥ and let T1,T2 denote the restrictions ofT toV respectivelyV⊥. Then

T =T1◦πV +T2◦πV⊥,

where πV, πV⊥ are the orthogonal projections onto the subspaces V, V⊥.

Thus the restrictions T1,T2 completely determine the operatorT.

Every eigenspace (T) of T and in particular the null space N(T) is

T-invariant. Write

H=N(T) +W,

where W = N(T). Then the restriction of T to W is a linear operator on W and obviously this restriction completely determines the operator T

(since the restriction of T to its null space is simply zero).

Thus we will often be able to disregard the eigenvectors associated with the eigenvalue zero, that is, the eigenvectors in the nullspace ofT.

Proposition 2.3.3. If the operator T onH is selfadjoint, then

kTk= sup{ |(T x, x)| : kxk= 1}. (2.5)

Proof. Clearly it will suffice to show (2.5) with ”kxk = 1” replaced with

kxk ≤1”.

Setψ(x, y) = (x, T y). Then ψis a bilinear form withkψk=kTk. Since

T is selfadjoint,ψ is symmetric. Now apply (2.3).

Positive operators. A bounded linear operatorA on H is calledpositive

if it satisfies

(Ax, x)0, for allxH.

If strict inequality holds for all nonzero x, thenA is called strictly positive.

For example, if X and Y are Hilbert spaces and T : X Y a bounded

linear operator, then the operator A=T∗T on X is positive: (Ax, x) = (T∗T x, x) = (T x, T x) =kT xk2 0.

Proposition 2.3.4. If the operatorAonHis positive, then every eigenvalue λ of A satisfies λ≥0.

(14)

Proof.Letx be an eigenvector with eigenvalue λ. Then

λkxk2 =λ(x, x) = (Ax, x)0.

Proposition 2.3.5. If the operator A on H is positive, then the operator αI+A has a bounded inverse on all of H, for eachα >0.

Proof.Letα >0 an set T =αI+A. Then, for each x∈Hwe have kT xk2=α2kxk2+ 2α(Ax, x) +kAxk2α2kxk2.

It follows thatT is one to one and has closed range. Moreover T is selfad-joint. ThusR(T)=N(T) ={0}. ThusT has dense range. It follows that

R(T) =Hand T has an inverse T−1:HH as a linear map. The inverse

is bounded sincekT xk ≥αkxk implies thatT1yα1kyk.

We will also need the following result

Proposition 2.3.6. If the operator A on H is positive, then there exists a unique positive operator S on Hsuch that A=S2.

The operator S is called the (positive) square root of A and denoted

S =

A. The existence of S is a special case of the so called continuous functional calculus which is a consequence of the representation theory of commutativeC∗-algebras. This theory is quite easy and provides the most natural proof. The reader is referred to the literature.

2.4

Compact operators between Banach spaces

Let us recall without proof some facts about compact sets in a complete normed spaceX. A subsetA⊆X is calledrelatively compact if the closure

ofA is compact. The set A is calledtotally bounded if for each >0 there are finitely many balls B(xi, ), xi X, of radius which cover A. With

this

Theorem 2.4.1. For a subset A⊆X the following are equivalent:

(i) A is relatively compact. (ii) A is totally bounded.

(iii) Each sequence (an)⊆A has a subsequence which converges in X.

The proof is given in every class on metric spaces. The limit of the subsequence in (iii) will be in the closure ofAbut need not be in A itself.

(15)

2.4. COMPACT OPERATORS BETWEEN BANACH SPACES 11

LetX, Y be complete normed spaces. A linear operator T :X Y is

called compact if the image T(B) Y of the unit ball B X is relatively

compact inY. T is called afinite rank operatorit the rangeR(T) :=T(X)

Y is finite dimensional. In this case T has the form

T(x) =X

j<nΛj(x)φj, x

X, (2.6)

where n=dim(R(T)), φj ∈Y and the Λj are continuous linear functionals

on X. Simply let the {φ0, . . . , φn1} be a basis forR(T) and Λj =ψjT,

where ψj is the coordinate functional associated with the basis vector φj,

that is,

y =X

j<nψj(y)φj, y

R(T).

Now set y = T x. Conversely every operator of this form is finite dimen-sional withR(T) =span({φj}). Since a bounded set in a finite dimensional

space is relatively compact (Bolzano-Weierstrass Theorem) every finite rank operator is compact.

Theorem 2.4.2. Let X, Y be complete normed spaces and T :X Y a

linear operator.

(i) If T is a finite rank operator then T is compact.

(ii) If T is the limit in operator norm of compact operators, thenT is com-pact.

Proof. Assume that Tn:X→ Y is compact, for each n≥1, and Tn→T in

operator norm. Let B X be the unit ball and >0. Choosensuch that kTnTk< /2. There exist finitely many ballsB(yi, /2)Y which cover

Tn(B). Then the corresponding balls B(yi, ) coverT(B). This shows that

T(B) is totally bounded.

Let us introduce the following notation: with B(X, Y) we denote the space of all bounded linear operators T : X Y. Likewise F(X, Y) and

K(X, Y) denote the set of finite rank respectively compact operators in

B(X, Y). IfX =Y, we writeB(X),F(X) andK(X) forB(X, X),F(X, X) and K(X, X).

It is easily verified that F(X, Y) and K(X, Y) are in fact subspaces of

B(X, Y). Then from (ii)

F(X, Y)K(X, Y)B(X, Y).

The converse of (ii) is not true in general but it is true if X and Y are Hilbert spaces as we shall see below. In other words, F(X, Y) 6=K(X, Y)

(16)

For an operatorT F(X, Y) we set rank(T) =dim(R(T). IfT has the

form (2.6, thenrank(T) =n ifφ0, . . . , φn−1 are linearly independent. Let T K(X, Y). Then the image T(D) Y of each bounded subset

D⊆X is relatively compact. Using (2.4.1) we see

Proposition 2.4.1. LetT B(X, Y). Then T is compact if and only if the

sequence(T xn)⊆Y has a convergent subsequence for each bounded sequence

(xn) a bounded sequence in X.

LetAbe any set andτ,σ topologies onAwithτ σ. Ifτ is a Hausdorff

topology and A compact in the topologyσ thenτ =σ.

It will suffice to show that each σ-closed set F A is τ-closed. Indeed,

Fisσ-compact and henceτ-compact (every cover withτ-open sets is a cover withσ-open sets). Sinceτ is Hausdorff it follows thatF isτ-closed.

Let X be a normed space and X∗ the space of all continuous linear functionals onX. Recall that the weak topology onX is the weakest topol-ogy in which all functionals F X are continuous. Clearly this topology

is weaker than the norm topolgy on X. It is a Hausdorff topology (the continuous linear functionals on a normed spaceX separate points onX).

The observation above shows that the weak topology agrees with the norm topology on every norm compact subset ofX. Recall that a sequence (xn) X satisfies xn x weakly (in the weak topology) if and only if

F(xn)→F(x), for each continuous linear functionalF ∈X∗.

Proposition 2.4.2. Let T B(X, Y) be compact and (xn) X bounded.

If xn→x∈X weakly, then T xn→T x in norm.

Proof. Since T is bounded the weak convergence xn x X implies the

weak convergence T xn T x. Choose a bounded subset B X with

(xn)⊆B and x∈B. Then K =T(B)⊆Y is compact. Consequently the

weak topology agrees with the norm topology on K. Since T xn, T x K

andT xn→T x weakly it follows that T xn→T xin norm.

Remark. A weakly convergent sequence (xn) is automatically bounded, that

is the assumption of boundedness above is superfluous but we don’t need this result. If (xn) is weakly convergent then it is weakly bounded, ie.

supn|F(xn)|<,

for each continuous linear functional F X. The Uniform Boundedness

(17)

2.4. COMPACT OPERATORS BETWEEN BANACH SPACES 13

Exercise. Let X, Y, Z be complete normed spaces and T : X Y,

S :Y Z bounded linear operators. If one ofS, T is compact then so is

the product S,T.

Hint: regardless of compactnessT maps bounded sets to bounded sets and

S maps relatively compact sets to relatively compact sets. We conclude this section with a characterization of compact operators on Hilbert space

Theorem 2.4.3. LetXandY be Hilbert spaces andT B(X, Y)a bounded

linear operator. Then T is compact if and only if kT enk → 0, for each

orthonormal sequence (en)⊆X.

Proof. () Assume thatT is compact and let (en)X be an orthonormal

sequence. Then

X

n

|(x, en)|2 ≤ kxk2<

and so (x, en) 0, as n↑ ∞, for each x∈X. By the Riesz representation

theorem this means F(en) 0, for each continuous linear functional F

X∗, that is, en 0 weakly inX. According to 2.4.2 the compactness of T

now implies T en→0 in norm.

() Recall that N1 denotes the closed unit ball of a normed space N.

Assume that T is not compact and henceT(X1)Y not totally bounded.

Let > 0 be such that the closure T(X1) cannot be covered with finitely many balls of radius 2. We construct an orthonormal sequence (en) ⊆X

such that kT enk ≥, for all n1.

(A) We claim that for every finite dimensional subspaceN Xthere exists

e∈N withkek= 1 and kT ek ≥.

If this were not true let N X be a finite dimensional subspace such

that kT ek ≤, for all eV :=N withkek ≤1, that isT(V1)Y1.

Note that T(N1) ⊆Y is compact and hence can be covered by finitely many balls Bj(yj, ) of radius . Since X1 N1+V1 we have T(X1)

T(N1) +T(V1). It follows that T(X1) is covered by the balls Bj(yj,2) in

contradiction to the choice of. This shows (A).

(B) Now we can construct the sequence (en) by induction. Using (A) with

N = {0} find e0 with kT e0k ≥ . Given that orthonormal e0, . . . , en with kT ejk ≥ have already been constructed set N = span({e0, . . . , en}) and

choose en+1 N⊥ with ken+1k = 1 such that kT en+1k ≥ . Then the sequence {e0, . . . , en+1}is orthonormal and the construction continues.

(18)

2.5

Compact selfadjoint operators

Let T be a compact, selfadjoint operator on a Hilbertspace H. Then T

can be diagonalized in the sense that there is an orthonormal basis for H consisting of eigenvectors ofT. This result makes it very easy to work with such operators. For the proof we need the following

Lemma 2.5.1. Let T be a compact, selfadjoint operator on H. Then at least one ofλ=kTk or λ=−kTk is an eigenvalue of T.

Proof.We may assume thatT 6= 0. From (2.5) we get a sequence of vectors

xn H with kxnk = 1 and λsuch that |λ|=kTk and |(T xn, xn)| → λ, as

n↑ ∞. Then, for each n0 we have

0≤ kT xnλxnk2 = kT xnk22λ(T xn, xn) +λ2kxnk2 (2.7) kTk22λ(T xn, xn) +λ2 (2.8)

Asn↑ ∞, the rightmost quantity converges to 2λ22λ2= 0. Thus we also haveT xn−λxn→0. Setyn=T xn. By compactness ofT the sequence yn

has a convergent subsequence.

Passing to this subsequence we may assume that the sequenceynis itself

convergent. But then the sequencexn=λ−1(yn−(yn−λxn) converges also.

Since T xn−λxn 0 the limit x = limnxn must satisfy T x = λx. Since

kxnk= 1, for all n, we have kxk= 1.

With this we can now prove the main result about compact selfadjoint op-erators:

Theorem 2.5.1. LetT be a compact, selfadjoint operator onH. Then there exists an orthonormal basis forHconsisting of eigenvectors ofT. More pre-ciselyN(T) has a countable orthonormal basis (φ

j)consisting of

eigenvec-tors ofT and ifλj are the associated eigenvalues, then

T x=X

jλj(x, φj)φj, x

H,

where the series converges in the norm ofH. If the sequence(φj) is infinite,

thenλj 0, asj ↑ ∞.

Proof. By induction we construct a (possibly finite) sequence of numbers

λj 6= 0 and orthonormal vectors φj such that

(i)T φj =λjφj,

(ii) the restrictionTj of T to0, . . . , φj−1}⊥ satisfies kTjk=|λj|, and

(19)

2.5. COMPACT SELFADJOINT OPERATORS 15

Since the λj are nonzero, each φj is in N(T) and from (iii) it follows that

theφj span all ofN(T) (recall that (A⊥) is the closed linear span ofA).

The quantitiesλ0 and φ0 exist by lemma (2.5.1). Assume thatλ0, . . . λj

and φ0, . . . , φj have already been constructed. Set

Xj =0, . . . , φj}⊥.

IfT = 0 onXj, then we are finished. Otherwise note thatXj is a closedT

-invariant subspace (sincespan({φ0, . . . , φj}) isT-invariant). The restriction

Tj of T to Xj is a compact selfadjoint operator on Xj. Applying lemma

(2.5.1) toTj we see that there is a unit vectorφj+1∈Xj and a numberλj+1 such that

(a) |λj+1|=kTjk and

(b) T φj+1 =Tjφj+1=λj+1φj+1.

Obviouslyφj+1 ⊥φ0. . . , φj and so the resulting sequence (φj) is

orthonor-mal. IfTj = 0 at any time, then (iii) is already satisfied and we are finished.

Assume now thatTj 6= 0, for allj≥0, setX=0, φ1, . . .}⊥ and letS be the restriction of T toX. We must show thatS = 0.

From (ii) it follows that|λ0| ≥ |λ1| ≥ · · · ≥ |λj| ≥ kSk, for allj0, and

so it will suffice to show that λj 0 asj↑ ∞.

Ifλj 6→0, we have |λj| ≥ρ for some numberρ >0. Then the sequence

(φj/λj)H is bounded and by compactness ofT the sequence

yj =T(φj/λj) =φj

has a convergent subsequence. However this contradicts the fact that the sequence φj is orthonormal and hence kφj φkk =

2, for all j 6= k.

Consequently we must have λj 0.

Remark 2.5.1 (Spectrum). We claim that the sequence (λj) contains

all the nonzero eigenvalues of T. If λ6=λj,0 were another eigenvalue, the

associated eigenspace would be contained in N(T) and perpendicular to all the φj which contradicts the fact that the φj span N(T). It follows

that the λj contain all the nonzero eigenvalues of T. Note also that the

convergence λj 0 implies that the eigenspaces corresponding to nonzero

eigenvalues are all finite dimensional.

The sequence (λj) contains all nonzero eigenvalues ofT but what about

the spectrum of T, that is the set

(20)

Let us assume thatHis not finite dimensional. Then the unit ballH1 is not compact. It follows thatT is not invertible, that is, 0σ(T) (regardless of

wether 0 is an eigenvalue or not). However, ifλ6=λj,0, for all j 0, then

it can be shown that the operatorT λI is invertible on H. To compute

(T−λI)1 we must solve

(T−λI)x=y (2.9)

forx in terms of y. Write V = N(T) and x =πV(x) +πV⊥(x) as well as

y=πV(y) +πV⊥(y). With this (2.9) becomes

λπV(x) + (TλI)πV(x) =πV(y) +π

V⊥(y)

and since V⊥ is T-invariant and hence T λI-invariant, this is equivalent

with

λπV(x) =πV(y) and (T λI)πV(x) =πV(y) (2.10)

Since the φj are an ON-basis for V⊥ we have πV⊥(y) = P

j(y, φj)φj and

πV(x) = P

jαjφj with αj to be determined. Note that (T −λI)φj =

(λj −λ)φj. With this (2.10) becomes

X

jαj(λj −λ)φj =

X

j(y, φj)φj

which solves forαj = (y, φj)/(λj−λ) resulting in

x=πV(x) +πV⊥(x) = 1 λπV(y) + X j (y, φj) λj−λ φj.

The solutionx exists for each y and is a continuous linear function of y, in other words (T−λI)1y=1 λπV(y) + X j (y, φj) λj−λ φj

exists as a continuous linear operator onH. Consequently the pointλis not in the spectrum ofT and we have shown that

σ(T) ={λj} ∪ {0}.

Remark 2.5.2 (Range). The series expansion (2.5.1) also allows us to determine the rangeR(T) quite easily. Lety∈Hand consider the equation

T x=y. (2.11)

If this equation has a solution x, then y N(T). Assume now that y

(21)

2.6. COMPACT OPERATORS BETWEEN HILBERT SPACES 17

withT x=y we can restrict ourselves tox∈N(T). Suchxwill then have

an expansion

x=X

jαjφj (2.12)

withαj to be determined. In terms of these series expansion (2.11) becomes

X

jαjλjφj =T x=y=

X

j(y, φj)φj

which implies that we must have αj =λ−j1(y, φj). However for theseαj the

series (2.12) converges exactly if Pjλ−j2|(y, φj)|2 <. It follows that

R(T) = n y N(T) : X 2 j |(y, φj)|2<∞ o

2.6

Compact operators between Hilbert spaces

The case of a general compact operatorsT :X→Y between Hilbert spaces

XandY can be reduced to the selfadjoint case by observing that the product

T∗T is a compact, selfadjoint operator onX. The results of the last section then carry over with minimal changes.

LetX and Y be Hilbert spaces, T B(X, Y). A singular system for T

is a sequence (µj, φj, ξj)j where

(i) µ0≥µ1≥ · · · ≥µn· · ·>0,

(ii){φj}is an ON-basis for N(T),

(iii) {ξj} is an ON-basis for N(T), and

(iv) T φj =µjξj and T∗ξj =µjφj, for all j≥0.

Assume that (µj, φj, ξj)j is such a system, set V =N(T) and let x ∈X.

Then the orthogonal projection πV(x) ofx on V has an expansion

πV(x) =

X

j(x, φj)φj

and applying T to this expansion it follows that

T x=T πV(x) =

X

jµj(x, φj)ξj (2.13)

with convergence pointwise on X. For φ∈Xand ξY define the rank one

operatorS =φ⊗ξ as

(22)

Then the above expansion forT can be rewritten as

T =X

jµj(φj

ξj) (2.14)

where the series converges pointwise onX. Set

Tn=

X

j<nµj(φj

ξj) (2.15)

and letx∈X. Using (i) and the orthonormality of theξn we have k(TTn)xk2 = X j≥nµj(x, φj)ξj 2=X j≥nµ 2 j|(x, φj)|2 µ2nX j≥n |(x, φj)|2 µ2nkxk2.

This shows that

kT Tnk ≤µn. (2.16)

in operator norm. Lettingx=φnabove we see that we actually have

equal-ity. Consequently, if µn 0, then the series (2.14) converges in operator

norm and henceT is compact.

Not every operator T B(X, Y) has a singular system. However, if

X=Y and T B(X) is selfadjoint, let{φj}be the eigenvectors associated

with the nonzero eigenvalues λj of T arranged in decreasing order. Then

(µj, φj, ξj)j with µj = λj and ξj = φj is a singular system for T. This is

exactly the content of Theorem 2.5.1. Now we generalize this fact to all compact operatorsT K(X, Y):

Theorem 2.6.1. Let T : X Y be a compact operator, set A = TT,

note thatAis compact and selfadjoint onX and let{φj}be the eigenvectors

associated with the nonzero eigenvaluesλj ofAarranged in decreasing order.

Then

µj =

p

λj, and ξj =µ−j1T φj

defines a singular system (µj, φj, ξj)j for T. We have µn 0 and hence

the series (2.14) converges in operator norm. In particularT is the limit of finite rank operators.

Proof. Note first that N(A) = N(T) according to (2.2.1). Thus theφj are

an ON-basis for N(T). By definition of (µj, φj, ξj) we have T φj = µjξj

and T∗T φj =µ2jφj and this implies that T∗ξj =µjφj. We claim that{ξj}

is an ON-basis forN(T∗). Indeed, forj, k 0 we have

(ξj, ξk) = (µ 1 j T φj, µ 1 k T φk) = (µjµk) 1(T T φj, φk) =δjk. (2.17)

(23)

2.6. COMPACT OPERATORS BETWEEN HILBERT SPACES 19

Thus {ξj} ⊆ R(T) N(T) := W is an orthonormal system. We claim

that this system spans all of W. Let w W and assume that w ξj, for

all j≥0. ThenTwR(T)N(T) and

(T∗w, φj) = (w, T φj) =µj(w, ξj) = 0,

for all j 0. Since the {φj} are an ON-basis for N(T) it follows that

T∗w= 0, that isw∈N(T) =W. ThuswW W={0}. This shows

that the orthonormal system {ξj} inN(T) is complete. .

Remark. IfT :X→Y is any bounded linear operator and (φj) an ON-basis

for V = N(T), then the expansion (2.1) is valid and applying T to this expansion yields

T x=X

j(x, φj)T φj.

What makes the expansion (2.13) interesting is the additional information contained in the singular system for T.

Remark 2.6.1 (Adjoint). Recall thatT∗∗=T. If (µj, φj, ξj)jis a singular

system for T then (µj, ξj, φj)j is a singular system for T∗ and so we have

the expansion

T∗y=X

jµj(ξj ⊗φj).

Thus if T is compact then so is the adjointT∗.

Remark 2.6.2 (Range). The expansion (2.13) allows us to work with ON-bases just as in the case of a compact selfadjoint operator. As an example we determine the range R(T), that is, we study the equation

T x=y. (2.18)

Fixy∈Y. If a solution exists, thenyR(T)N(T). Now assume that

y N(T). Then we have an expansion y =P

j(y, ξj)ξj. If there exists

any solutionxof (2.18) inX, then there exists a solution inV =N(T)(in factπV(x) is one). Thus we may assume thatx∈V and have an expansion

x=X

jαjφj. (2.19)

Applying T to this yields

X

jαjµjξj =T x=y=

X

(24)

It follows that we must have αj = µ−j1(y, ξj). With this the series for x

converges exactly ifPjµ−j2|(y, ξj)|2 <. Consequently

R(T) = n y N(T) : X 2 j |(y, ξj)| 2 <o (2.20) exactly as in the selfadjoint case.

2.7

Hilbert-Schmidt and trace class operators

LetX,Y be Hilbert spaces,T K(X, Y) compact and (µj, φj, ξj)j a

singu-lar system forT. We know from Theorem 2.6.1 thatT is the limit in operator norm of finite operators. Now we quantify the speed of convergence.

Approximation numbers. Set

Tn=

X

j<nµj(φj

ξj).

We have seen that then

kT Tnk ≤µn. (2.21)

On the other hand we show now that

kTSk ≥µn, (2.22)

for each finite rank operator S F(X, Y) with rank(S) n. Set Xn =

span({φ0, . . . , φn}and note that

kT xk ≥µnkxk, for all xXn. (2.23) Let x Xn. Then x = P j≤n(x, φj)φj and so T x = P j≤nµj(x, φj)ξj. It follows that kT xk2 =X j≤nµ 2 j|(x, φj)|2 ≥µ2n X j≤n |(x, φj)|2 =µ2nkxk2.

Now letS F(X, Y) withdim(R(S))n. ThenS is not one to one onXn

and so there exists a unit vectoru∈Xn withSu= 0. Using (2.23) we have k(T S)uk=kT uk ≥µn.

ThuskT Sk ≥µn. The quantities

(25)

2.7. HILBERT-SCHMIDT AND TRACE CLASS OPERATORS 21

are called the approximation numbers of T. Here a0(T) = kTk. The esti-mates (2.21) and (2.22) show that

µn=an(T) (2.25)

and that the operator S = Tn provides the best approximation of T in

the operator norm among all operators of rank at most n. In particular this shows that the numbers µn in a singular system for T are uniquely

determined by T and do not depend on the singular system.

The µn are called the singular values of T. Obviously the vectors φn

andξnin a singular system forT are not uniquely determined. Consider the

selfadjoint case and note that there are many ways to extract an orthonormal basis from each eigenspace of T.

The approximation numbers an(T) are defined for each bounded linear

operator T B(X, Y). T is compact if and only if an(T) 0, as n ↑ ∞

and this is the only case of interest. In this case we havean(T) =µn, where

the µj are the singular values ofT (square root of the eigenvalues of T∗T).

For each bounded linear operator T B(X, Y) let kTk=Xan(T)p

1/p

and let

Sp(X, Y) ={T B(X, Y) : kTk

p <∞ }.

Clearly each T ∈ Sp(X, Y) is compact. One can show that Sp(X, Y)

B(X, Y) is a closed subspace but we won’t need this result. We are only interested in the cases p= 1,2.

We now assume thatT K(X, Y) is compact and (µj, φj, ξj)j a singular

system for T.

Hilbert-Schmidt operators. The operatorT is called aHilbert-Schmidt operator, ifT ∈ S2(X, Y), that is,

kTk2 2 := X nan(T) 2 =X 2 n<∞.

Proposition 2.7.1. If T K(X, Y) is compact and {eα} is any ON-basis

for X, then kTk2 2= X α kT eαk2.

Remark. It follows that T is a Hilbert-Schmidt operator if and only if

P

αkT eαk

2

<∞, for some ON-basis {eα} of X and in this case the sum is

(26)

We do not assume that X is separable, that is that the basis {eα} is

countable. However since T and hence T∗ are compact the entire action is essentially separable: both N(T) and R(T) = N(T∗) have countable ON-bases.

Proof. Let {eα} be any ON-basis for X. Since {ξk} is an ON-basis for

N(T∗)=R(T), we have kT eαk2=X k|(T eα, ξk)| 2=X k|(eα, T ξk)|2 = X 2 k|(eα, φk)|2,

for eachα. It follows that

X α kT eαk2 = X α X 2 k|(eα, φk)|2 =X 2 k X α |(eα, φk)|2 = X 2 kkφkk2 = X 2 k=kTk 2 2.

Hilbert-Schmidt operators on the spaceX =L2(ν) of square integrable functions with respect to a finite measure ν will be characterized in terms of integration kernels below.

Trace class operators. We now assume that X and Y have the same orthogonal dimension, that is, ON-bases {eα} of X and {fα} of Y can be

indexed with the same indicesα. Because of the compactness of T we can even assume both spaces to be separable. The operator T is called a trace class operator,ifT ∈ S1(X, Y), that is,

kTk

1=

X

nan(T)<

.

Recall that (µj, φj, ξj)j denotes a singular system for T. It follows that

kTk

1=

P

nµn.

Proposition 2.7.2. Let T K(X, Y). Then kTk

1 = max

X

α

|(T eα, fα)|, (2.26)

where the maximum is taken over all ON-bases{eα} of X and {fα} of Y.

Proof. Let {eα} and {fα} be ON-bases of X and Y and write T eα =

P jµj(eα, φj)ξj. It follows that X α |(T eα, fα)| X α X jµj |(eα, φj)||(ξj, fα)| = X jµj X α |(eα, φj)||(ξj, fα)| X jµj X α |(eα, φj)|2 1/2X α |(eα, ξj)|2 1/2 X jµj kφjk kξjk=X kµk= kTk 1.

References

Related documents