• No results found

The Krylov subspace method The Krylov subspace method

Radial Basis Functions with Compact Support

7.2 The BFGP algorithm and the new Krylov method

7.2.4 The Krylov subspace method The Krylov subspace method

7.2.4 The Krylov subspace method

A more recent alternative to the implementation of the above BFGP method A more recent alternative to the implementation of the above BFGP method ma

178

178 7. Implementations7. Implementations

 ·· ∗∗ used in the previous subsection and in Section 5.2. It uses basically theused in the previous subsection and in Section 5.2. It uses basically the

same local Lagrange functions as the original BFGP method but the salient same local Lagrange functions as the original BFGP method but the salient idea of the implementation has a close relationship to conjugate gradient (it idea of the implementation has a close relationship to conjugate gradient (it is the same as conjugate gradients except for the stopping criterion and the is the same as conjugate gradients except for the stopping criterion and the time of the updates of the elements of 

time of the updates of the elements of  K K  in the interpolant) and optimisationin the interpolant) and optimisation methods. There are practical gains in efficiency in this method. Also, it enjoys methods. There are practical gains in efficiency in this method. Also, it enjoys gu

guararananteeteedd conconvevergrgencencee inin cocontntrarastst wiwithth ththee alalgogoritrithmhm acaccocordrdinging toto (7(7.5.5)) whwhichich is faster than the one shown above to converge, but may itself sometimes fail is faster than the one shown above to converge, but may itself sometimes fail to converge.

to converge.

We continue to let

We continue to let U  U  be the be the space of approximants spannspace of approximants spanned by ed by the translatesthe translates

of the radial basis functions

of the radial basis functions φφ(( ·· −−ξ ξ ),), ξ ξ  ∈∈ , plus the aforementioned poly-, plus the aforementioned poly- nomial kernel

nomial kernel K K  of of the semi-inner product. Additionallythe semi-inner product. Additionally U U  j j denotes the spacedenotes the space

spanne

spanned d byby (7

(7..9)9) ηη((ss∗∗)),, ηη((ηη((ss∗∗)))), . . . , η, . . . , η j j((ss∗∗)),,

where

where ss∗∗ ∈∈ U U  is still the required interpolant andis still the required interpolant and ηη:: U U  U U  is a prescribedis a prescribed

op

opereratoatorr whwhososee chchoioicece isis fufundndamamententalal toto thethe dedefinfinitiitionon anandd ththee fufuncnctiotioninningg ofof ththee method. We shall define it below. (It is closely related to the

method. We shall define it below. (It is closely related to theηη of the subsectionof the subsection before last.) The main objective of any Krylov subspace method is to compute before last.) The main objective of any Krylov subspace method is to compute in the

in the jjth iterationth iteration ss j j ++11 ∈∈ U U  j j such thatsuch that ss j j ++11 minimisesminimises ss −− ss∗∗ among allamong all

ss ∈∈ U U  j j,, wwheherere wewe alalwwayayss bebegiginn wwitithhss11 == 0.0. HeHerere,,  ··  isis aa nonormrm oror sesemimi-n-norormm

which corresponds to the posed problem, and it is our semi-norm from above which corresponds to the posed problem, and it is our semi-norm from above in the radial basis function context.

in the radial basis function context.

We have the following three assumptions on the operator

We have the following three assumptions on the operator ηη which must onlywhich must only depend on

depend on functiofunction n evevaluatioaluations ns onon :: (a)

(a) ss ∈∈ K K  ==⇒⇒ ηη((ss)) == ss,, (b)

(b) ss ∈∈ U U \ K \ K  ==⇒ ((ss,, ηη((ss)))) >> 0 and0 and

(c)

(c) ss,, t t  ∈∈ U U  ==⇒ ((ηη((ss)),,t t )) == ((ss,, ηη((t t ))))..

Subject to these conditions, the above strategy leads to the sought interpolant Subject to these conditions, the above strategy leads to the sought interpolant

ss∗∗ in finitely many steps when we use exact arithmetic, and in particular thein finitely many steps when we use exact arithmetic, and in particular the familiar sequence

familiar sequence (7

(7..10)10) ss∗∗ −− ss j j∗∗,, jj == 11,,22,,33, . . . ,, . . . ,

ss j j denotindenoting g the the approxiapproximation aftermation after jj −− 1 1 sweepssweeps, , decreasdecreases es strictlstrictly y monotomonoton-n-

ically as

ically as jj increases until increases until we reachwe reach ss∗∗. We note immediately that conditions (a). We note immediately that conditions (a) and (b) imply the important nonsingularity statement that

and (b) imply the important nonsingularity statement that ηη((ss)) == 0 for some0 for some

ss ∈∈ U U ononlyly if if ss vanvanisishehes.s. InIndeedeed,d, if if ss isis anan elelememenentt frfromom,, ouourr clclaiaimm isis trtriivivialallyly

true. Otherwise

true. Otherwise, , if if ss is not in the kernelis not in the kernel K K , then, then ss == 0 and if 0 and if ηη((ss) ) vanvanishes thisishes this contradicts (b).

7.2 The BFGP algorithm

7.2 The BFGP algorithm 179179

Further, this nonsingularity statement is essential to the dimension of the Further, this nonsingularity statement is essential to the dimension of the space generated by (7.9). In fact, in the opposing case of singularity, the se- space generated by (7.9). In fact, in the opposing case of singularity, the se- quence (7.9) might not generate a whole

quence (7.9) might not generate a whole jj-dimensional subspace-dimensional subspace U U  j j and inand in

pa

partrticiculularar mamayy exexclclududee ththee rerequiquireredd sosolulutitiononss∗∗ momodudulolo anan elelememenentt of of K K ,, whwhicichh would, of course, be a disaster. Condition (a) guarantees that the polynomial would, of course, be a disaster. Condition (a) guarantees that the polynomial part of 

part of ss∗∗ from the kernelfrom the kernel K K  will be recovered exactly by the method which iswill be recovered exactly by the method which is a natural and minimal requirement.

a natural and minimal requirement. The coefficients of the iterates

The coefficients of the iterates ss j j with respect to the given canonical basis of with respect to the given canonical basis of  each

each U U  j j are, however, never computed explicitly, because those bases are ill-are, however, never computed explicitly, because those bases are ill-

conditioned. Instead, we begin an optimisation procedure based on the familiar conditioned. Instead, we begin an optimisation procedure based on the familiar conjugate gradients (e.g. Golub and Van Loan, 1989, see also the last section conjugate gradients (e.g. Golub and Van Loan, 1989, see also the last section of this chapter) and compute for each iteration index

of this chapter) and compute for each iteration index jj == 11,,22, . . ., . . .

(7

(7..11)11) ss j j++11 == ss j j ++ αα j jd d  j j,,

where

where d d  j j is a search directionis a search direction (7

(7..12)12) d d  j j == ηη((ss∗∗ −− ss j j)) ++ ββ j jd d  j j−−11,,

which we wish to make orthogonal to

which we wish to make orthogonal to d d  j j−−11 with respect to the native spacewith respect to the native space

semi-inner product. Moreover,

semi-inner product. Moreover, d d 00 ::== 00,,ss11 ::== 0, and in particular0, and in particular d d 11 == ηη((ss∗∗))..

No further directions have to be used in (7.12) on the right-hand side in order to No further directions have to be used in (7.12) on the right-hand side in order to obtain the required conjugacy. This is an important fact and it is a consequence obtain the required conjugacy. This is an important fact and it is a consequence of the self-adjointness condition (c).

of the self-adjointness condition (c).

We shall see that condition (b) is important for generating linearly indepen- We shall see that condition (b) is important for generating linearly indepen- dent search directions. The aforementioned orthogonality condition defines the dent search directions. The aforementioned orthogonality condition defines the

β

β j j in (7.12), and thein (7.12), and the αα j j is chosen so as to guarantee the monotonic decrease.is chosen so as to guarantee the monotonic decrease.

The calculation ends if the residuals

The calculation ends if the residuals ||ss j j++11((ξ ξ ))−− f f ξ ξ || are small enough uniformlyare small enough uniformly

in

inξ ξ  ∈∈ ,, e.e.g.g. clclososee toto mamachchininee acaccucuraraccyy. . InIn fafactct,, asas wewe shshalalll nonotete inin ththee ththeoeoreremm below

below, , full full orthogonalityorthogonality (7

(7..1133)) ((d d  j j,, d d k k ))∗∗ == 00,, 11 ≤≤ jj << k k  << k k ∗∗,,

is automatically obtained with the aid of condition (c),

is automatically obtained with the aid of condition (c), k k ∗∗ being the index of being the index of  the final iteration. This fact makes this algorithm a conjugate gradient method, the final iteration. This fact makes this algorithm a conjugate gradient method, the search directions being conjugate with respect to our semi-inner product. the search directions being conjugate with respect to our semi-inner product. However, the polynomial terms which belong to the interpolant are computed However, the polynomial terms which belong to the interpolant are computed on every iteration, while a genuine conjugate gradient method only works on on every iteration, while a genuine conjugate gradient method only works on the preconditioned positive definite matrix and always modulo the kernel the preconditioned positive definite matrix and always modulo the kernel K K 

of the semi-inner product. The necessary correction term from

of the semi-inner product. The necessary correction term from K K  is then addedis then added at the

180 7. Implementations

The parameter β j from (7.12) can be specified. It is

β j = −

η(s∗ − s j), d  j −1

∗ d  j −12 .

The minimising parameter in (7.11) is

α j = (s

− s

 j, d  j)∗

d  j2

.

It guarantees that s∗ − s j∗ is minimal amongst all s j ∈ U  j −1 through the

orthogonality property

(s − s j, g)∗ = 0, ∀g ∈ U  j −1.

We note that if the required interpolant is already in the kernel of the semi- inner product, then (a) implies that d 1 = η(s∗) = s∗ (recalling that s1 is zero)

and the choice α1 = 1 gives s2 = s∗ as required. Otherwise, s1 being zero, s2 is

computed asα1d 1, whered 1 = η(s∗)andα1 minimisess∗−α1d 1∗. Sinces∗ ∈

K , (a) implies d 1 ∈ K . Hence (d 1,d 1)∗ is positive and α1 = (s∗,d 1)∗/(d 1,d 1)∗

because of (b). We get, as required, s∗ − s2∗ < s∗ − s1∗.

Thus, in Faul and Powell (1999b), the following theorem is proved by induc- tion on the iteration index.

Theorem 7.3. For j > 1 , the Krylov subspace method with an operator  η

that fulfils conditions (a)–(c) leads to iterates s j with uniquely defined search

directions (7.12) that fulfil (7.13) and lead to positive α j and strictly monotoni-

cally decreasing (7.10) until termination. The method stops in exact arithmetic

in k ∗−1steps(whichisatmostm−dim K),whereinthecasethat s j−s∗∗ van-

ishes during the computation, the choice α j = 1 gives immediate termination.

After the general description of the proposed Krylov subspace method we have to outline its practical implementation for the required radial basis function interpolants. To this end, the definition of the operator η is central. If the kernel

K  is trivial, i.e. the radial basis function is conditionally positive definite of  order zero – subject to a sign-change if necessary – we use the operator

(7.14) η: s → m

k =1 ( Lloc ,s)∗ Lloc λk ξ  ,

where the local Lagrange functions are the same as in the BFGP method defined in the subsection before last for the sets L except that for all k  > m − qwe

7.2 The BFGP algorithm 181

functions. The semi-inner product (·, ·)∗ is still the same as we have used

before. This suits for instance the inverse multiquadric radial basis function. The positive definiteness is a particularly simple case. In fact, the conditions (a)–(c) which the operator η has to meet follow immediately. Indeed, there is no polynomial reproduction to prove in this case. Condition (b) follows from the fact that the local Lagrange functions Lloc are linearly independent, as a consequence of the positivity of their ‘leading (first nonzero) coefficient’λk ξ , and because the translates of the radial basis function are linearly independent due to the (stronger) fact of nonsingularity of the interpolation matrix for distinct centres. Property (c) is true for reasons of symmetry of the inner product and due to the definition (7.14).

Taking the approximate Lagrange functions for this scheme is justified by the following observations. Firstly, the theoretical choice of the identity operator 

as η would clearly lead to the sought solution in one iteration (with s2 = α1s∗ and taking α1 = 1), so it is reasonable to take some approximation of that. Secondly, we claim that this theoretic choice is equivalent to taking mutually orthogonal basis functions in (7.14). The orthogonality is, of course, in all cases with respect to (·, ·)∗. This claim we establish as follows.

In fact, it is straightforward to see that the η is the identity operator if and only if the Lagrange functions therein are the full Lagrange functions, i.e.

q = || – see also the subsection before last – and therefore the approxi- mate Lagrange functions are useful. Further, we can prove the equivalence of  orthogonality and fulfilment of the standard Lagrange conditions. Recalling the reproducing kernel property of the radial basis function with respect to the (semi-)inner product and Lemma 7.1, we observe that the orthogonality is a con- sequence of the assumption that the basis functions satisfy the global Lagrange conditions

(7.15) Lloc (ξ ) = 0,  > k ,

if Lloc have the form (7.1), because condition (7.15) and the reproduction prop- erty in Lemma 7.1 imply the orthogonality

(7.16) ( Lloc , L jloc)∗ =

ξ ∈L j

λ j ξ  Lloc (ξ ) = 0

whenever j > k . This is indeed the aforementioned orthogonality. The con- verse – orthogonality implying Lagrange conditions – is also true as a conse- quence of (7.16) and the positivity of the coefficients λk ξ .

Property (7.2) is a good approximation to (7.15). For the same reasons as we have stated in the context of the BFGP method, fulfilling the complete Lagrange conditions is not suitable if we want to have an efficient iterative

182 7. Implementations

method because this would amount to solving the full linear system in advance, and therefore we choose to employ those approximate Lagrange functions. This requires at most O(mq3) operations.

When the kernel K  is nontrivial, the following choice of η is fitting. We let

η be the operator (7.17) η: s → m−q∗

k =1 ( Lloc , s)∗ Lloc λk ,ξ  + η¯(s),

where the local Lagrange functions are the same as before. Here ¯η(s) is the interpolant from T ∗ which satisfies the interpolation conditions

¯

η(s)(ξ ) = s(ξ ), ξ  ∈ Lmq+1.

The fact that ¯η(s) agrees with s for alls which are a linear combination of radial basis function translates φ( x − ξ ), ξ  ∈ Lmq+1, and possibly an element

from K , by the uniqueness of the interpolants – it is a projection – immediately leads to the fulfilment of conditions (a)–(c).

As far as implementation is concerned, it is much easier, and an essential ingredient of the algorithm, to work as much as possible on the coefficients

of the various linear combinations of radial basis functions and polynomials involved and not to work with the functions themselves (see Faul and Powell, 1999b). At the start of each iteration, the coefficient vectors λ(s j) = (λ(s j)i)mi=1

and γ (s j) = (γ (s j)i)i=1 are available for the current approximation, by which

we mean the real coefficients of the translates of the radial basis functions

φ( · −ξ i) and of the monomials that span K , respectively. We shall also use in the same vein the notations λ(d  j), λ(d  j−1) etc. for the appropriate coefficients

of d  j, d  j−1 and other expressions which have expansions, as always, in the

translates of the radial basis functions φ( · −ξ i) plus a polynomial. Further, we know the residuals

ξ j = f ξ  − s j(ξ ), ξ  ∈ ,

and the values of the search direction at the ξ s. Further, λ(d  j−1) and γ (d  j−1)

are also stored, as are all d  j−1(ξ i), i = 1,2, . . . , m. The first step now is the

computation of the corresponding coefficients of η(s∗ − s) in the new search direction which we do by puttings = s∗−s j in the definition of ηand evaluating

( L jloc, s∗ −s j)∗ as

ξ L j λ j ξ r 

j

ξ  according to Lemma 7.1 and the above display.

When j = 1 at the beginning of the process, the coefficients λ(d  j) and γ (d  j)

are precisely the coefficients of η(s∗ − s), otherwise we set

˜ β j = −

m i=1 λ

η(s∗ − s)

id  j−1(ξ i)

m i=1 λ(d  j−1)id  j−1(ξ i) .