Radial Basis Functions with Compact Support
7.2 The BFGP algorithm and the new Krylov method
7.2.4 The Krylov subspace method The Krylov subspace method
7.2.4 The Krylov subspace method
A more recent alternative to the implementation of the above BFGP method A more recent alternative to the implementation of the above BFGP method ma
178
178 7. Implementations7. Implementations
·· ∗∗ used in the previous subsection and in Section 5.2. It uses basically theused in the previous subsection and in Section 5.2. It uses basically the
same local Lagrange functions as the original BFGP method but the salient same local Lagrange functions as the original BFGP method but the salient idea of the implementation has a close relationship to conjugate gradient (it idea of the implementation has a close relationship to conjugate gradient (it is the same as conjugate gradients except for the stopping criterion and the is the same as conjugate gradients except for the stopping criterion and the time of the updates of the elements of
time of the updates of the elements of K K in the interpolant) and optimisationin the interpolant) and optimisation methods. There are practical gains in efficiency in this method. Also, it enjoys methods. There are practical gains in efficiency in this method. Also, it enjoys gu
guararananteeteedd conconvevergrgencencee inin cocontntrarastst wiwithth ththee alalgogoritrithmhm acaccocordrdinging toto (7(7.5.5)) whwhichich is faster than the one shown above to converge, but may itself sometimes fail is faster than the one shown above to converge, but may itself sometimes fail to converge.
to converge.
We continue to let
We continue to let U U be the be the space of approximants spannspace of approximants spanned by ed by the translatesthe translates
of the radial basis functions
of the radial basis functions φφ(( ·· −−ξ ξ ),), ξ ξ ∈∈ , plus the aforementioned poly-, plus the aforementioned poly- nomial kernel
nomial kernel K K of of the semi-inner product. Additionallythe semi-inner product. Additionally U U j j denotes the spacedenotes the space
spanne
spanned d byby (7
(7..9)9) ηη((ss∗∗)),, ηη((ηη((ss∗∗)))), . . . , η, . . . , η j j((ss∗∗)),,
where
where ss∗∗ ∈∈ U U is still the required interpolant andis still the required interpolant and ηη:: U U →→ U U is a prescribedis a prescribed
op
opereratoatorr whwhososee chchoioicece isis fufundndamamententalal toto thethe dedefinfinitiitionon anandd ththee fufuncnctiotioninningg ofof ththee method. We shall define it below. (It is closely related to the
method. We shall define it below. (It is closely related to theηη of the subsectionof the subsection before last.) The main objective of any Krylov subspace method is to compute before last.) The main objective of any Krylov subspace method is to compute in the
in the jjth iterationth iteration ss j j ++11 ∈∈ U U j j such thatsuch that ss j j ++11 minimisesminimises ss −− ss∗∗∗∗ among allamong all
ss ∈∈ U U j j,, wwheherere wewe alalwwayayss bebegiginn wwitithhss11 == 0.0. HeHerere,, ·· ∗∗ isis aa nonormrm oror sesemimi-n-norormm
which corresponds to the posed problem, and it is our semi-norm from above which corresponds to the posed problem, and it is our semi-norm from above in the radial basis function context.
in the radial basis function context.
We have the following three assumptions on the operator
We have the following three assumptions on the operator ηη which must onlywhich must only depend on
depend on functiofunction n evevaluatioaluations ns onon :: (a)
(a) ss ∈∈ K K ==⇒⇒ ηη((ss)) == ss,, (b)
(b) ss ∈∈ U U \ K \ K ==⇒⇒ ((ss,, ηη((ss))))∗∗ >> 0 and0 and
(c)
(c) ss,, t t ∈∈ U U ==⇒⇒ ((ηη((ss)),,t t ))∗∗ == ((ss,, ηη((t t ))))∗∗..
Subject to these conditions, the above strategy leads to the sought interpolant Subject to these conditions, the above strategy leads to the sought interpolant
ss∗∗ in finitely many steps when we use exact arithmetic, and in particular thein finitely many steps when we use exact arithmetic, and in particular the familiar sequence
familiar sequence (7
(7..10)10) ss∗∗ −− ss j j∗∗,, jj == 11,,22,,33, . . . ,, . . . ,
ss j j denotindenoting g the the approxiapproximation aftermation after jj −− 1 1 sweepssweeps, , decreasdecreases es strictlstrictly y monotomonoton-n-
ically as
ically as jj increases until increases until we reachwe reach ss∗∗. We note immediately that conditions (a). We note immediately that conditions (a) and (b) imply the important nonsingularity statement that
and (b) imply the important nonsingularity statement that ηη((ss)) == 0 for some0 for some
ss ∈∈ U U ononlyly if if ss vanvanisishehes.s. InIndeedeed,d, if if ss isis anan elelememenentt frfromomK K ,, ouourr clclaiaimm isis trtriivivialallyly
true. Otherwise
true. Otherwise, , if if ss is not in the kernelis not in the kernel K K , then, then ss == 0 and if 0 and if ηη((ss) ) vanvanishes thisishes this contradicts (b).
7.2 The BFGP algorithm
7.2 The BFGP algorithm 179179
Further, this nonsingularity statement is essential to the dimension of the Further, this nonsingularity statement is essential to the dimension of the space generated by (7.9). In fact, in the opposing case of singularity, the se- space generated by (7.9). In fact, in the opposing case of singularity, the se- quence (7.9) might not generate a whole
quence (7.9) might not generate a whole jj-dimensional subspace-dimensional subspace U U j j and inand in
pa
partrticiculularar mamayy exexclclududee ththee rerequiquireredd sosolulutitiononss∗∗ momodudulolo anan elelememenentt of of K K ,, whwhicichh would, of course, be a disaster. Condition (a) guarantees that the polynomial would, of course, be a disaster. Condition (a) guarantees that the polynomial part of
part of ss∗∗ from the kernelfrom the kernel K K will be recovered exactly by the method which iswill be recovered exactly by the method which is a natural and minimal requirement.
a natural and minimal requirement. The coefficients of the iterates
The coefficients of the iterates ss j j with respect to the given canonical basis of with respect to the given canonical basis of each
each U U j j are, however, never computed explicitly, because those bases are ill-are, however, never computed explicitly, because those bases are ill-
conditioned. Instead, we begin an optimisation procedure based on the familiar conditioned. Instead, we begin an optimisation procedure based on the familiar conjugate gradients (e.g. Golub and Van Loan, 1989, see also the last section conjugate gradients (e.g. Golub and Van Loan, 1989, see also the last section of this chapter) and compute for each iteration index
of this chapter) and compute for each iteration index jj == 11,,22, . . ., . . .
(7
(7..11)11) ss j j++11 == ss j j ++ αα j jd d j j,,
where
where d d j j is a search directionis a search direction (7
(7..12)12) d d j j == ηη((ss∗∗ −− ss j j)) ++ ββ j jd d j j−−11,,
which we wish to make orthogonal to
which we wish to make orthogonal to d d j j−−11 with respect to the native spacewith respect to the native space
semi-inner product. Moreover,
semi-inner product. Moreover, d d 00 ::== 00,,ss11 ::== 0, and in particular0, and in particular d d 11 == ηη((ss∗∗))..
No further directions have to be used in (7.12) on the right-hand side in order to No further directions have to be used in (7.12) on the right-hand side in order to obtain the required conjugacy. This is an important fact and it is a consequence obtain the required conjugacy. This is an important fact and it is a consequence of the self-adjointness condition (c).
of the self-adjointness condition (c).
We shall see that condition (b) is important for generating linearly indepen- We shall see that condition (b) is important for generating linearly indepen- dent search directions. The aforementioned orthogonality condition defines the dent search directions. The aforementioned orthogonality condition defines the
β
β j j in (7.12), and thein (7.12), and the αα j j is chosen so as to guarantee the monotonic decrease.is chosen so as to guarantee the monotonic decrease.
The calculation ends if the residuals
The calculation ends if the residuals ||ss j j++11((ξ ξ ))−− f f ξ ξ || are small enough uniformlyare small enough uniformly
in
inξ ξ ∈∈ ,, e.e.g.g. clclososee toto mamachchininee acaccucuraraccyy. . InIn fafactct,, asas wewe shshalalll nonotete inin ththee ththeoeoreremm below
below, , full full orthogonalityorthogonality (7
(7..1133)) ((d d j j,, d d k k ))∗∗ == 00,, 11 ≤≤ jj << k k << k k ∗∗,,
is automatically obtained with the aid of condition (c),
is automatically obtained with the aid of condition (c), k k ∗∗ being the index of being the index of the final iteration. This fact makes this algorithm a conjugate gradient method, the final iteration. This fact makes this algorithm a conjugate gradient method, the search directions being conjugate with respect to our semi-inner product. the search directions being conjugate with respect to our semi-inner product. However, the polynomial terms which belong to the interpolant are computed However, the polynomial terms which belong to the interpolant are computed on every iteration, while a genuine conjugate gradient method only works on on every iteration, while a genuine conjugate gradient method only works on the preconditioned positive definite matrix and always modulo the kernel the preconditioned positive definite matrix and always modulo the kernel K K
of the semi-inner product. The necessary correction term from
of the semi-inner product. The necessary correction term from K K is then addedis then added at the
180 7. Implementations
The parameter β j from (7.12) can be specified. It is
β j = −
η(s∗ − s j), d j −1
∗ d j −12∗ .The minimising parameter in (7.11) is
α j = (s
∗ − s
j, d j)∗
d j2∗
.
It guarantees that s∗ − s j∗ is minimal amongst all s j ∈ U j −1 through the
orthogonality property
(s − s j, g)∗ = 0, ∀g ∈ U j −1.
We note that if the required interpolant is already in the kernel of the semi- inner product, then (a) implies that d 1 = η(s∗) = s∗ (recalling that s1 is zero)
and the choice α1 = 1 gives s2 = s∗ as required. Otherwise, s1 being zero, s2 is
computed asα1d 1, whered 1 = η(s∗)andα1 minimisess∗−α1d 1∗. Sinces∗ ∈
K , (a) implies d 1 ∈ K . Hence (d 1,d 1)∗ is positive and α1 = (s∗,d 1)∗/(d 1,d 1)∗
because of (b). We get, as required, s∗ − s2∗ < s∗ − s1∗.
Thus, in Faul and Powell (1999b), the following theorem is proved by induc- tion on the iteration index.
Theorem 7.3. For j > 1 , the Krylov subspace method with an operator η
that fulfils conditions (a)–(c) leads to iterates s j with uniquely defined search
directions (7.12) that fulfil (7.13) and lead to positive α j and strictly monotoni-
cally decreasing (7.10) until termination. The method stops in exact arithmetic
in k ∗−1steps(whichisatmostm−dim K),whereinthecasethat s j−s∗∗ van-
ishes during the computation, the choice α j = 1 gives immediate termination.
After the general description of the proposed Krylov subspace method we have to outline its practical implementation for the required radial basis function interpolants. To this end, the definition of the operator η is central. If the kernel
K is trivial, i.e. the radial basis function is conditionally positive definite of order zero – subject to a sign-change if necessary – we use the operator
(7.14) η: s → m
k =1 ( Llock ,s)∗ Llock λk ξ k ,where the local Lagrange functions are the same as in the BFGP method defined in the subsection before last for the sets Lk except that for all k > m − q∗ we
7.2 The BFGP algorithm 181
functions. The semi-inner product (·, ·)∗ is still the same as we have used
before. This suits for instance the inverse multiquadric radial basis function. The positive definiteness is a particularly simple case. In fact, the conditions (a)–(c) which the operator η has to meet follow immediately. Indeed, there is no polynomial reproduction to prove in this case. Condition (b) follows from the fact that the local Lagrange functions Llock are linearly independent, as a consequence of the positivity of their ‘leading (first nonzero) coefficient’λk ξ k , and because the translates of the radial basis function are linearly independent due to the (stronger) fact of nonsingularity of the interpolation matrix for distinct centres. Property (c) is true for reasons of symmetry of the inner product and due to the definition (7.14).
Taking the approximate Lagrange functions for this scheme is justified by the following observations. Firstly, the theoretical choice of the identity operator
as η would clearly lead to the sought solution in one iteration (with s2 = α1s∗ and taking α1 = 1), so it is reasonable to take some approximation of that. Secondly, we claim that this theoretic choice is equivalent to taking mutually orthogonal basis functions in (7.14). The orthogonality is, of course, in all cases with respect to (·, ·)∗. This claim we establish as follows.
In fact, it is straightforward to see that the η is the identity operator if and only if the Lagrange functions therein are the full Lagrange functions, i.e.
q = || – see also the subsection before last – and therefore the approxi- mate Lagrange functions are useful. Further, we can prove the equivalence of orthogonality and fulfilment of the standard Lagrange conditions. Recalling the reproducing kernel property of the radial basis function with respect to the (semi-)inner product and Lemma 7.1, we observe that the orthogonality is a con- sequence of the assumption that the basis functions satisfy the global Lagrange conditions
(7.15) Llock (ξ ) = 0, > k ,
if Llock have the form (7.1), because condition (7.15) and the reproduction prop- erty in Lemma 7.1 imply the orthogonality
(7.16) ( Llock , L jloc)∗ =
ξ ∈L j
λ j ξ Llock (ξ ) = 0
whenever j > k . This is indeed the aforementioned orthogonality. The con- verse – orthogonality implying Lagrange conditions – is also true as a conse- quence of (7.16) and the positivity of the coefficients λk ξ k .
Property (7.2) is a good approximation to (7.15). For the same reasons as we have stated in the context of the BFGP method, fulfilling the complete Lagrange conditions is not suitable if we want to have an efficient iterative
182 7. Implementations
method because this would amount to solving the full linear system in advance, and therefore we choose to employ those approximate Lagrange functions. This requires at most O(mq3) operations.
When the kernel K is nontrivial, the following choice of η is fitting. We let
η be the operator (7.17) η: s → m−q∗
k =1 ( Llock , s)∗ Llock λk ,ξ k + η¯(s),where the local Lagrange functions are the same as before. Here ¯η(s) is the interpolant from T ∗ which satisfies the interpolation conditions
¯
η(s)(ξ ) = s(ξ ), ξ ∈ Lm−q∗+1.
The fact that ¯η(s) agrees with s for alls which are a linear combination of radial basis function translates φ( x − ξ ), ξ ∈ Lm−q∗+1, and possibly an element
from K , by the uniqueness of the interpolants – it is a projection – immediately leads to the fulfilment of conditions (a)–(c).
As far as implementation is concerned, it is much easier, and an essential ingredient of the algorithm, to work as much as possible on the coefficients
of the various linear combinations of radial basis functions and polynomials involved and not to work with the functions themselves (see Faul and Powell, 1999b). At the start of each iteration, the coefficient vectors λ(s j) = (λ(s j)i)mi=1
and γ (s j) = (γ (s j)i)i=1 are available for the current approximation, by which
we mean the real coefficients of the translates of the radial basis functions
φ( · −ξ i) and of the monomials that span K , respectively. We shall also use in the same vein the notations λ(d j), λ(d j−1) etc. for the appropriate coefficients
of d j, d j−1 and other expressions which have expansions, as always, in the
translates of the radial basis functions φ( · −ξ i) plus a polynomial. Further, we know the residuals
r ξ j = f ξ − s j(ξ ), ξ ∈ ,
and the values of the search direction at the ξ s. Further, λ(d j−1) and γ (d j−1)
are also stored, as are all d j−1(ξ i), i = 1,2, . . . , m. The first step now is the
computation of the corresponding coefficients of η(s∗ − s) in the new search direction which we do by puttings = s∗−s j in the definition of ηand evaluating
( L jloc, s∗ −s j)∗ as
ξ ∈L j λ j ξ rj
ξ according to Lemma 7.1 and the above display.
When j = 1 at the beginning of the process, the coefficients λ(d j) and γ (d j)
are precisely the coefficients of η(s∗ − s), otherwise we set
˜ β j = −