The Krylov subspace method The Krylov subspace method

Radial Basis Functions with Compact Support

7.2 The BFGP algorithm and the new Krylov method

7.2.4 The Krylov subspace method The Krylov subspace method

7.2.4 The Krylov subspace method

A more recent alternative to the implementation of the above BFGP method A more recent alternative to the implementation of the above BFGP method ma

178

178 7. Implementations7. Implementations



 ·· ∗∗ used in the previous subsection and in Section 5.2. It uses basically theused in the previous subsection and in Section 5.2. It uses basically the

same local Lagrange functions as the original BFGP method but the salient same local Lagrange functions as the original BFGP method but the salient idea of the implementation has a close relationship to conjugate gradient (it idea of the implementation has a close relationship to conjugate gradient (it is the same as conjugate gradients except for the stopping criterion and the is the same as conjugate gradients except for the stopping criterion and the time of the updates of the elements of

time of the updates of the elements of K K in the interpolant) and optimisationin the interpolant) and optimisation methods. There are practical gains in efﬁciency in this method. Also, it enjoys methods. There are practical gains in efﬁciency in this method. Also, it enjoys gu

guararananteeteedd conconvevergrgencencee inin cocontntrarastst wiwithth ththee alalgogoritrithmhm acaccocordrdinging toto (7(7.5.5)) whwhichich is faster than the one shown above to converge, but may itself sometimes fail is faster than the one shown above to converge, but may itself sometimes fail to converge.

to converge.

We continue to let

We continue to let U U _{be the}_{be the space of approximants spann}_{space of approximants spanned by}_{ed by the translates}_{the translates}

of the radial basis functions

of the radial basis functions φφ(( ·· −−ξ ξ ),), ξ ξ ∈∈ , plus the aforementioned poly-, plus the aforementioned polynomial kernel

nomial kernel K K of of the semi-inner product. Additionallythe semi-inner product. Additionally U U _j_j _{denotes the space}_{denotes the space}

spanne

spanned d byby (7

(7..9)9) ηη((ss∗∗)),, ηη((ηη((ss∗∗)))), . . . , η, . . . , η j j((ss∗∗)),,

where

where ss∗∗ ∈∈ U U _{is still the required interpolant and}_{is still the required interpolant and} _η_η_:: U U _→_→ U U _{is a prescribed}_{is a prescribed}

opereratoatorr whwhososee chchoioicece isis fufundndamamententalal toto thethe dedefinfinitiitionon anandd ththee fufuncnctiotioninningg ofof ththee method. We shall define it below. (It is closely related to the

method. We shall deﬁne it below. (It is closely related to theηη of the subsectionof the subsection before last.) The main objective of any Krylov subspace method is to compute before last.) The main objective of any Krylov subspace method is to compute in the

in the jjth iterationth iteration ss_j_{j +}₊₁₁ ∈∈ U U _j_j _{such that}_{such that} _ss_j_{j +}₊₁₁ _minimises_minimises __{ss −}_{− ss}∗∗___∗_∗ _{among all}_{among all}

ss ∈∈ U U _j_j_{,, w}_whe_here_{re we}_{we al}_alw_way_{ayss be}_begi_gin_{n w}_wit_ith_h_ss₁₁ ₌₌ _0._{0. He}_Here_re,, __{ ·· }__∗_∗ _is_{is aa no}_norm_{rm or}_{or se}_semi_mi-n_-nor_orm_m

which corresponds to the posed problem, and it is our semi-norm from above which corresponds to the posed problem, and it is our semi-norm from above in the radial basis function context.

in the radial basis function context.

We have the following three assumptions on the operator

We have the following three assumptions on the operator ηη which must onlywhich must only depend on

depend on functiofunction n evevaluatioaluations ns onon :: (a)

(a) ss ∈∈ K K ==⇒⇒ ηη((ss)) == ss,, (b)

(b) ss ∈∈ U U _{\ K}_\ _{K =}_=⇒_⇒ ₍₍_{ss,, η}_η₍₍_ss₎₎₎₎_∗_∗ _>_> _{0 and}_{0 and}

(c)

Subject to these conditions, the above strategy leads to the sought interpolant Subject to these conditions, the above strategy leads to the sought interpolant

ss∗∗ in ﬁnitely many steps when we use exact arithmetic, and in particular thein ﬁnitely many steps when we use exact arithmetic, and in particular the familiar sequence

familiar sequence (7

(7..10)10) ss∗∗ −− ss j j∗∗,, jj == 11,,22,,33, . . . ,, . . . ,

ss j j denotindenoting g the the approxiapproximation aftermation after jj −− 1 1 sweepssweeps, , decreasdecreases es strictlstrictly y monotomonoton-n-

ically as

ically as jj increases until increases until we reachwe reach ss∗∗. We note immediately that conditions (a). We note immediately that conditions (a) and (b) imply the important nonsingularity statement that

and (b) imply the important nonsingularity statement that ηη((ss)) == 0 for some0 for some

ss ∈∈ U U _on_only_{ly if}_if_ss _van_v_anis_ishe_hes._{s. In}_Indee_deed,_{d, if}_if_ss _is_{is an}_{an el}_elem_emen_{entt fr}_from_om_K_K_{,, ou}_{ourr cl}_clai_aim_{m is}_{is tr}_triivi_vial_ally_ly

true. Otherwise

true. Otherwise, , if if ss is not in the kernelis not in the kernel K K , then, then ss == 0 and if 0 and if ηη((ss) ) vanvanishes thisishes this contradicts (b).

7.2 The BFGP algorithm

7.2 The BFGP algorithm 179179

Further, this nonsingularity statement is essential to the dimension of the Further, this nonsingularity statement is essential to the dimension of the space generated by (7.9). In fact, in the opposing case of singularity, the se- space generated by (7.9). In fact, in the opposing case of singularity, the sequence (7.9) might not generate a whole

quence (7.9) might not generate a whole jj-dimensional subspace-dimensional subspace U U _j_j _{and in}_{and in}

partrticiculularar mamayy exexclclududee ththee rerequiquireredd sosolulutitiononss∗∗ momodudulolo anan elelememenentt of of K K ,, whwhicichh would, of course, be a disaster. Condition (a) guarantees that the polynomial would, of course, be a disaster. Condition (a) guarantees that the polynomial part of

part of ss∗∗ from the kernelfrom the kernel K K will be recovered exactly by the method which iswill be recovered exactly by the method which is a natural and minimal requirement.

a natural and minimal requirement. The coefﬁcients of the iterates

The coefﬁcients of the iterates ss_j_j with respect to the given canonical basis of with respect to the given canonical basis of each

each U U _j_j _{are, however, never computed explicitly, because those bases are ill-}_{are, however, never computed explicitly, because those bases are ill-}

conditioned. Instead, we begin an optimisation procedure based on the familiar conditioned. Instead, we begin an optimisation procedure based on the familiar conjugate gradients (e.g. Golub and Van Loan, 1989, see also the last section conjugate gradients (e.g. Golub and Van Loan, 1989, see also the last section of this chapter) and compute for each iteration index

of this chapter) and compute for each iteration index jj == 11,,22, . . ., . . .

(7..11)11) ss_j_j++11 == ss j j ++ αα j jd d j j,,

where

where d d _j_j is a search directionis a search direction (7

(7..12)12) d d _j_j == ηη((ss∗∗ −− ss_j_j)) ++ ββ_j_jd d _j_j−−11,,

which we wish to make orthogonal to

which we wish to make orthogonal to d d _j_j−−11 with respect to the native spacewith respect to the native space

semi-inner product. Moreover,

semi-inner product. Moreover, d d ₀₀ ::== 00,,ss₁₁ ::== 0, and in particular0, and in particular d d ₁₁ == ηη((ss∗∗))..

No further directions have to be used in (7.12) on the right-hand side in order to No further directions have to be used in (7.12) on the right-hand side in order to obtain the required conjugacy. This is an important fact and it is a consequence obtain the required conjugacy. This is an important fact and it is a consequence of the self-adjointness condition (c).

of the self-adjointness condition (c).

We shall see that condition (b) is important for generating linearly indepen- We shall see that condition (b) is important for generating linearly independent search directions. The aforementioned orthogonality condition deﬁnes the dent search directions. The aforementioned orthogonality condition deﬁnes the

β j j in (7.12), and thein (7.12), and the αα j j is chosen so as to guarantee the monotonic decrease.is chosen so as to guarantee the monotonic decrease.

The calculation ends if the residuals

The calculation ends if the residuals ||ss_j_j++11((ξ ξ ))−− f f ξ ξ || are small enough uniformlyare small enough uniformly

inξ ξ ∈∈ ,, e.e.g.g. clclososee toto mamachchininee acaccucuraraccyy. . InIn fafactct,, asas wewe shshalalll nonotete inin ththee ththeoeoreremm below

below, , full full orthogonalityorthogonality (7

(7..1133)) ((d d j j,, d d k k ))∗∗ == 00,, 11 ≤≤ jj << k k << k k ∗∗,,

is automatically obtained with the aid of condition (c),

is automatically obtained with the aid of condition (c), k k ∗∗ being the index of being the index of the final iteration. This fact makes this algorithm a conjugate gradient method, the final iteration. This fact makes this algorithm a conjugate gradient method, the search directions being conjugate with respect to our semi-inner product. the search directions being conjugate with respect to our semi-inner product. However, the polynomial terms which belong to the interpolant are computed However, the polynomial terms which belong to the interpolant are computed on every iteration, while a genuine conjugate gradient method only works on on every iteration, while a genuine conjugate gradient method only works on the preconditioned positive definite matrix and always modulo the kernel the preconditioned positive definite matrix and always modulo the kernel K K

of the semi-inner product. The necessary correction term from

of the semi-inner product. The necessary correction term from K K is then addedis then added at the

180 7. Implementations

The parameter β j from (7.12) can be speciﬁed. It is

β_j = −



η(s∗ − s j), d j −1



∗ d j −12_∗ .

The minimising parameter in (7.11) is

α_j = (s

∗ _{− s}

j, d j)∗

d j2_∗

It guarantees that s∗ − s j∗ is minimal amongst all s j ∈ U j −1 through the

orthogonality property

(s − s j, g)∗ = 0, ∀g ∈ U j −1.

We note that if the required interpolant is already in the kernel of the semi- inner product, then (a) implies that d 1 = η(s∗) = s∗ (recalling that s1 is zero)

and the choice α1 = 1 gives s2 = s∗ as required. Otherwise, s1 being zero, s2 is

computed asα1d 1, whered 1 = η(s∗)andα1 minimisess∗−α1d 1∗. Sinces∗ ∈

K , (a) implies d 1 ∈ K . Hence (d 1,d 1)∗ is positive and α1 = (s∗,d 1)∗/(d 1,d 1)∗

because of (b). We get, as required, s∗ − s2∗ < s∗ − s1∗.

Thus, in Faul and Powell (1999b), the following theorem is proved by induc- tion on the iteration index.

Theorem 7.3. _{For j >} ₁_{, the Krylov subspace method with an operator η}

that fulﬁls conditions (a)–(c) leads to iterates s j with uniquely deﬁned search

directions (7.12) that fulﬁl (7.13) and lead to positive α j and strictly monotoni-

cally decreasing (7.10) until termination. The method stops in exact arithmetic

in k ∗−1steps(whichisatmostm−dim K),whereinthecasethat s j−s∗∗ van-

ishes during the computation, the choice α j = 1 gives immediate termination.

After the general description of the proposed Krylov subspace method we have to outline its practical implementation for the required radial basis function interpolants. To this end, the deﬁnition of the operator η is central. If the kernel

K is trivial, i.e. the radial basis function is conditionally positive deﬁnite of order zero – subject to a sign-change if necessary – we use the operator

(7.14) η: s → m



k =1 ( Lloc_k ,s)∗ Lloc_k λk ξ _k ,

where the local Lagrange functions are the same as in the BFGP method deﬁned in the subsection before last for the sets L_k _{except that for all} _{k > m − q}∗ _we

7.2 The BFGP algorithm 181

functions. The semi-inner product (·, ·)∗ is still the same as we have used

before. This suits for instance the inverse multiquadric radial basis function. The positive definiteness is a particularly simple case. In fact, the conditions (a)–(c) which the operator η has to meet follow immediately. Indeed, there is no polynomial reproduction to prove in this case. Condition (b) follows from the fact that the local Lagrange functions Lloc_k are linearly independent, as a consequence of the positivity of their ‘leading (first nonzero) coefficient’λ_k ξ_k, and because the translates of the radial basis function are linearly independent due to the (stronger) fact of nonsingularity of the interpolation matrix for distinct centres. Property (c) is true for reasons of symmetry of the inner product and due to the definition (7.14).

Taking the approximate Lagrange functions for this scheme is justiﬁed by the following observations. Firstly, the theoretical choice of the identity operator

as η would clearly lead to the sought solution in one iteration (with s2 = α1s∗ and taking α₁ = 1), so it is reasonable to take some approximation of that. Secondly, we claim that this theoretic choice is equivalent to taking mutually orthogonal basis functions in (7.14). The orthogonality is, of course, in all cases with respect to (·, ·)∗. This claim we establish as follows.

In fact, it is straightforward to see that the η is the identity operator if and only if the Lagrange functions therein are the full Lagrange functions, i.e.

q = || – see also the subsection before last – and therefore the approximate Lagrange functions are useful. Further, we can prove the equivalence of orthogonality and fulﬁlment of the standard Lagrange conditions. Recalling the reproducing kernel property of the radial basis function with respect to the (semi-)inner product and Lemma 7.1, we observe that the orthogonality is a consequence of the assumption that the basis functions satisfy the global Lagrange conditions

(7.15) Lloc_k (ξ _) = 0,  > k ,

if Lloc_k have the form (7.1), because condition (7.15) and the reproduction property in Lemma 7.1 imply the orthogonality

(7.16) ( Lloc_k , L_jloc)∗ =



ξ ∈L_j

λ j ξ Lloc_k (ξ ) = 0

whenever j > k . This is indeed the aforementioned orthogonality. The con- verse – orthogonality implying Lagrange conditions – is also true as a consequence of (7.16) and the positivity of the coefﬁcients λk ξ _k.

Property (7.2) is a good approximation to (7.15). For the same reasons as we have stated in the context of the BFGP method, fulﬁlling the complete Lagrange conditions is not suitable if we want to have an efﬁcient iterative

182 7. Implementations

method because this would amount to solving the full linear system in advance, and therefore we choose to employ those approximate Lagrange functions. This requires at most O(mq3) operations.

When the kernel K is nontrivial, the following choice of η is ﬁtting. We let

η be the operator (7.17) η: s → m−q∗



k =1 ( Lloc_k , s)∗ Lloc_k λ_k ,ξ_k + η¯(s),

where the local Lagrange functions are the same as before. Here ¯η(s) is the interpolant from T ∗ which satisﬁes the interpolation conditions

η(s)(ξ ) = s(ξ ), ξ ∈ L_m₋_q∗₊₁.

The fact that ¯η(s) agrees with s for alls which are a linear combination of radial basis function translates φ( x − ξ ), ξ ∈ L_m₋_q∗₊₁, and possibly an element

from K , by the uniqueness of the interpolants – it is a projection – immediately leads to the fulﬁlment of conditions (a)–(c).

As far as implementation is concerned, it is much easier, and an essential ingredient of the algorithm, to work as much as possible on the coefﬁcients

of the various linear combinations of radial basis functions and polynomials involved and not to work with the functions themselves (see Faul and Powell, 1999b). At the start of each iteration, the coefﬁcient vectors λ(s j) = (λ(s j)i)m_i₌₁

and γ (s j) = (γ (s j)i)_i₌₁ are available for the current approximation, by which

we mean the real coefﬁcients of the translates of the radial basis functions

φ( · −ξ _i) and of the monomials that span K , respectively. We shall also use in the same vein the notations λ(d j), λ(d j−1) etc. for the appropriate coefﬁcients

of d j, d j−1 and other expressions which have expansions, as always, in the

translates of the radial basis functions φ( · −ξ _i) plus a polynomial. Further, we know the residuals

r _ξj = f ξ − s j(ξ ), ξ ∈ ,

and the values of the search direction at the ξ s. Further, λ(d j−1) and γ (d j−1)

are also stored, as are all d j−1(ξ i), i = 1,2, . . . , m. The ﬁrst step now is the

computation of the corresponding coefﬁcients of η(s∗ − s) in the new search direction which we do by puttings = s∗−s j in the deﬁnition of ηand evaluating

( L_jloc, s∗ −s j)∗ as



_ξ_∈L_j λ j ξ r

ξ according to Lemma 7.1 and the above display.

When j = 1 at the beginning of the process, the coefﬁcients λ(d j) and γ (d j)

are precisely the coefﬁcients of η(s∗ − s), otherwise we set

˜ β_j = −



m i=1 λ



η(s∗ − s)



id j−1(ξ i)



m i=1 λ(d j−1)id j−1(ξ i) .

In document Buhmann M D - Radial Basis Functions, Theory and Implementations (CUP 2004)(271s) (Page 190-196)