• No results found

Newton Krylov Type Algorithm for Solving Nonlinear Least Squares Problems

N/A
N/A
Protected

Academic year: 2020

Share "Newton Krylov Type Algorithm for Solving Nonlinear Least Squares Problems"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Volume 2009, Article ID 435851,17pages doi:10.1155/2009/435851

Research Article

Newton-Krylov Type Algorithm for Solving

Nonlinear Least Squares Problems

Mohammedi R. Abdel-Aziz

1

and Mahmoud M. El-Alem

1, 2

1Department of Mathematics and Computer Science, Faculty of Science, Kuwait University, P.O. 5969, Safat 13060, Kuwait City, Kuwait

2Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt

Correspondence should be addressed to Mohammedi R. Abdel-Aziz,mr [email protected]

Received 15 December 2008; Accepted 2 February 2009

Recommended by Irena Lasiecka

The minimization of a quadratic function within an ellipsoidal trust region is an important subproblem for many nonlinear programming algorithms. When the number of variables is large, one of the most widely used strategies is to project the original problem into a small dimensional subspace. In this paper, we introduce an algorithm for solving nonlinear least squares problems. This algorithm is based on constructing a basis for the Krylov subspace in conjunction with a model trust region technique to choose the step. The computational step on the small dimensional subspace lies inside the trust region. The Krylov subspace is terminated such that the termination condition allows the gradient to be decreased on it. A convergence theory of this algorithm is presented. It is shown that this algorithm is globally convergent.

Copyrightq2009 M. R. Abdel-Aziz and M. M. El-Alem. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Nonlinear least squares NLS problems are unconstrained optimization problems with special structures. These problems arise in many aspects such as the solution of overde-termined systems of nonlinear equations, some scientific experiments, pattern recognition, and maximum likelihood estimation. For more details about these problems1. The general formulation of the NLS problem is to determine the solution x Rn that minimizes the function

fx Rx22

m

i1

Rix

2

, 1.1

whereRx R1x, R2x, . . . , Rmxt, Ri:Rn → R,1≤imand m≥n.

(2)

Gauss-Newton and Levenberg-Marquart methods3. The second type is closely related to unconstrained minimization techniques4. Of course the two types of algorithms are closely related to each other5.

The presented algorithm is a Newton-Krylov type algorithm. It requires a fixed-size limited storage proportional to the size of the problem and relies only upon matrix vector product. It is based on the implicitly restarted Arnoldi methodIRAMto construct a basis for the Krylov subspace and to reduce the Jacobian into a Hessenberg matrix in conjunction with a trust region strategy to control the step on that subspace6.

Trust region methods for unconstrained minimization are blessed with both strong theoretical convergence properties and a good accurate results in practice. The trial computational step in these methods is to find an approximate minimizer of some model of the true objective function within a trust region for which a suitable norm of the correction lies inside a given bound. This restriction is known as the trust region constraint, and the bound on the norm is its radius. The radius is adjusted so that successive model problems minimized the true objective function within the trust region7.

The trust region subproblem is the problem of findingsΔso that

φsΔ min{φs:s ≤Δ}, 1.2

whereΔis some positive constant, · is the Euclidean norm inRn, and

φs gts1 2s

tHs, 1.3

whereg ∈RnandH∈Rn×nare, respectively, the gradient vector and the Hessian matrix or their approximations.

There are two different approaches to solve1.2. These approaches are based on either solving several linear systems 8 or approximating the curve sΔby a piecewise linear approximation “dogleg strategies”9. In large-scale optimization, solving linear systems is computationally expensive. Moreover, it is not clear how to define the dogleg curves when the matrixHis singular10.

Several authors have studied inexact Newton’s methods for solving NLS problems

11. Xiaofang et al. have introduced stable factorized quassi-Newton methods for solving large-scale NLS12. Dennis et al. proposed a convergence theory for structured BFGS secant method with an application for NLS 13. The Newton-Krylov method is an example of inexact Newton methods. Krylov techniques inside a Newton iteration in the context of system of equations have been proposed in 14. The recent work of Sorensen, provides an algorithm which is based on recasting the trust region subproblem into a parameterized eigenvalue problem. This algorithm provides a super linearly convergent scheme to adjust the parameter and find the optimal solution from the eigenvector of the parameterized problem, as long as the hard case does not occur15.

This contribution is organized as follows. InSection 2, we introduce the structure of the NLS problem and a general view about the theory of the trust region strategies and their convergence. The statement of the algorithm and its properties is developed in Section 3.

(3)

2. Structure of the Problem

The subspace technique plays an important role in solving the NLS problem1.1. We assume that the current iteratexkand a Krylov subspaceSk. Denote the dimension ofSkto bekjand {v1k, v2k, . . . , vkk

j }be a set of linearly independent vectors inSk. The next iteratexk1is such

that the incrementxk1−xkSk. Thus, we haveRjxkVky0, forj 1,2, . . . , mand

y ∈ Rkj, where the matrixV

k v1k, v

k

2 , . . . , v

k

kj . The second-order Taylor expansion of Rx2

2atxkis

R

xk

Jks22stQks, 2.1

whereQk ∈ Rn×n is defined byQk nj1Rjxk∇2Rjxkand the jacobian matrix isJk

JxkR1xk,R2xk, . . . ,Rmxkt. If we considersSk, we get the quadratic model

qky R

xk

JkVky22ytAky, 2.2

whereAk∈Rkj,kjapproximates the reduced matrixVktQkVkmj1RjxkVkt∇2RjxkVk. The first order necessary conditions of2.1is

JktJkQk

sRtkJk0. 2.3

Thus, a solution of the quadratic model is a solution to an equation of the form

JktJkQk

sRtkJk. 2.4

The model trust region algorithm generates a sequence of pointsxk, and at thekth stage of the iteration the quadratic model of problem1.2has the form

φks JktRks 1 2s

tJt kJkQk

s. 2.5

At this stage, an initial value for the trust region radius Δk is also available. An inner iteration is then performed which consists of using the current trust region radius,Δk, and the information contained in the quadratic model to compute a step,sΔk. Then a comparison of the actual reduction of the objective function

aredkΔk

Rxk22R

xksk

Δk22, 2.6

and the reduction predicted by the quadratic model

predk

Δk

(4)

is performed. If there is a satisfactory reduction, then the step can be taken, or a possibly larger trust region used. If not, then the trust region is reduced and the inner iteration is repeated. For now, we leave unspecified what algorithm is used to form the stepsΔ, and how the trust region radiusΔis changed. We also leave unspecified the selection ofJktJkQkexcept to restrict it to be symmetric. Details on these points will be addressed in our forth coming paper.

The solutionsof the quadratic model2.2is a solution to an equation of the form

JtJQμIsRtJ, 2.8

withμ≥0,μstsΔ2 0 andJtJQ μIis positive semidefinite. This represents the first-order necessary conditions concerning the pairμands, wheresis a solution to model

2.2andμis the Lagrange multiplier associated with the constraintstsΔ2. The sucient

conditions that will ensuresto be a solution to2.2can be established in the following lemma which has a proof in16.

Lemma 2.1. Letμ∈Rands∈RnsatisfyJtJQ μIsRtJwithJtJQ μIpositive

semidefinite, if

μ0, s ≤Δthenssolves2.2,

s Δ, thenssolves min{φs:s Δ},

μ0 withs Δ, thenssolves2.2.

2.9

Of course, ifJtJQ μIis positive definite thensis a unique solution.

The main result which is used to prove that the sequence of gradients tends to zero for modified Newton methods is

fφs≥ 1

2R

tJmin Δ, RtJ

JtJQ

, 2.10

where s is a solution to 2.2. The geometric interpretation of this inequality is that for a quadratic function, any solution s to2.2 produces a decrease fφsthat is at least as much as the decrease along the steepest descent direction−RtJ. A proof of this result may be found in17.

3. Algorithmic Framework

(5)

The desired properties of this algorithm are:

1the algorithm should be well defined for a sufficiently general class of functions and it is globally convergent;

2the algorithm should be invariant under linear affine scalings of the variables, that is, if we replace fxby fy fsV y, where V ∈ Rn×k is an orthonormal matrix,x ∈ Rnandy ∈Rk, then applying the iteration tofwith initial guessy0

satisfyingx0 sV y0should produce a sequence{yl}related to the sequence{xl} byxl V yls, wherexl is produced by applying the algorithm tof with initial guessx0;

3the algorithm should provide a decrease that is at least as large as a given multiple of the minimum decrease that would be provided by a quadratic search along the steepest descent direction;

4the algorithm should give as good a decrease of the quadratic model as a direction of the negative gradient when the Hessian,JtJQ, is indefinite and should force the direction to be equal to the Newton direction when JtJ Q is symmetric positive definite.

The following describes a full iteration of a truncated Newton type method. Some of the previous characteristics will be obvious and the other ones will be proved in the next section.

Algorithm 3.1.

1Step 0initialization.

Givenx0∈Rn, computeRtJx0,JtJQx0.

Chooseξ1< ξ2,η1∈0,1, η2 >1,Δ1>0, k1.

2Step 1construction a basis for the Krylov subspace.

aChoose ∈ 0,1, an initial guess s0 which can be chosen to be zero, form

r0 −RtJJtJQs0, whereRtJ RtJxn,JtJQ JtJQxn, and computeτr0andv1 r0.

bForj 1,2, . . .until convergence, do.

iFormJtJQv

jand orthogonalize it against the previousv1, v2, . . . , vjthen

hi, j JtJQv

j, vi, i1,2, . . . , j,

vj1 JtJQvj

j

i1hi, jvi,

hi1, j vj1.

vj1vj1/hi1, j.

iiCompute the residual norm βj RtJ JtJ Qsjof the solution sj that would be obtained if we stopped.

iiiIfβjjRtJ, wherej ≤1/Dj ∈0, , andDjis an approximation to the condition number ofJtJQ

(6)

3Step 2compute the trial step.

iConstruct the local model,

ψky fkhtky 1 2y

tH

ky, 3.1

wherehkVktRtJk∈RkjandHkVktJtJQkVk∈Rkj×kj.

iiCompute the solutionyto the problem min{ψky :y ≤ Δk}and set sk

Vky.

4Step 3test the step and updateΔk.

iEvaluatefk1fxkVky, aredkfkfxkVkyand predkfkψVky.

iiIfaredk/predk< ξ1then

beginΔkgo to 3i

Comment: Do not accept the step “y”, reduce the trust region radius “Δk” and compute a new step

Else ifξ1≤aredk/predk≤ξ2then

xk1xkVky

Comment: Accept the step and keep the previous trust region radius the same Elsearedk/predk> ξ2then

Δk

Comment: Accept the step and increase the trust region radius end if

5Step 4.

Setk1←kand go to Step 1.

4. The Restarting Mechanism

In this section, we discuss the important role of the restarting mechanism to control the possibility of the failure of nonsingularity of the Hessian matrix, and introduce the assumptions under which we prove the global convergence.

Let the sequence of the iterates generated byAlgorithm 3.1be{xk}; for such sequence we assume

G1for allk, xkandxksk∈ΩwhereΩ⊂Rnis a convex set;

G2fC2ΩandRtJ fx/0 for allxΩ;

G3fxis bounded in norm inΩandJtJQis nonsingular for allxΩ;

G4for allsksuch thatxksk∈Ω, the termination condition,RtJk JTJQsk

jRtJk,k∈0,1is satisfied;

An immediate consequence of the global assumptions is that the Hessian matrix

JtJQxis bounded, that is, there exists a constantβ

1 >0 such thatJtJQx< β1.

(7)

Assumption G3 thatJtJQis nonsingular is necessary since the IRA iteration is only guaranteed to converge for nonsingular systems. We first discuss the two ways in which the IRA method can breakdown. This can happen if eithervk1 0 so thatvk1cannot be formed,

or ifHkmax is singular which means that the maximum number of IRA steps has been taken,

but the final iterate cannot be formed. The first situation has often been referred to as a soft breakdown, sincevk1 0 impliesJtJQkis nonsingular andxkis the solution18. The second case is more serious in that it causes a convergence failure of the Arnoldi process. One possible recourse is to hope thatJtJQkis nonsingular for somekamong 1,2, . . . , kmax−1.

If suchkexists, we can computexjand then restart the algorithm usingxkas the new initial guessx0. It may not be possible to do this.

To handle the problem of singularHk, we can use a technique similar to that done in the full dimensional space. IfHkis singular, or its condition number is greater than 1/

ν, whereνis the machine epsilon, then we perturb the quadratic modelψyto

ψy ψy 1

2γy

tyfhty1 2y

tH kγI

y, 4.1

whereγnνHk1. The condition number ofHkγIis roughly 1/ν. For more details about this technique see reference4.

We have discussed the possibility of stagnation in the linear IRA method which results in a break down in the nonlinear iteration. Sufficient conditions under which stagnation of the linear iteration never occurs are

1we have to ensure that the steepest descent direction belongs to the subspace,Sk. Indeed in Algorithm 3.1, the Krylov subspace will contain the steepest descent direction becausev1−RtJ/RtJand the Hessian is nonsingular;

2there is no difficulty, if one required the Hessian matrix,JtJQxto be positive definite for all x ∈ Ω, then the termination condition can be satisfied, and so convergence of the sequence of iterates will follow. This will be the case for some problems, but, requiring JtJ Qx to be positive definite every where is very restrictive.

One of the main restrictions of most of the Newton-Krylov schemes is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination conditionassumption G4. This condition is enough to essentially guarantee convergence of the trust region algorithm. Practically, the main difficulty is that one does not know in advance if the subspace chosen for projection will be good enough to guarantee this condition. Thus, kj can be chosen as large as the termination condition will eventually be satisfied, but, whenkj is too large the computational cost and the storage become too high. An alternative to that is to restart the algorithm keepingJtJQnonsingular andk

j< n. Moreover, preconditioning and scaling are essential for the successful application of these schemes19.

5. Global Convergence Analysis

(8)

ones arise from the fact that a lower dimensional quadratic model is used, rather than the full dimension model that is used in the preexisting results10.

Lemma 5.1. Let the global assumptions hold. Then for anys∈Ω, one has

RtJt ks s2 ≥

DkkJtJQ

k

1k

Dk

RtJk>0. 5.1

Proof. Suppose rk be the residual associated with s so that rk RtJk JtJQks and

RtJ

k/0. ThenrkkRtJk, andsJtJQ

1

k RtJkrk. So,

RtJt ks

RtJt k

JtJQ−1 k

RtJ krk

RtJt k

JtJQ−1 k

RtJ k

RtJt krk.

5.2

Hence,

RtJt ks

s

RtJt k

JtJQ−1 k

RtJ k

RtJt krk

JtJQ−1 k

RtJ

krk

RtJtkJtJQk1RtJkRtJtkrk

JtJQ−1 k

RtJ

krk

.

5.3

Next,rkRtJkimplies|RtJtkrkj| ≤kRtJk2which gives

RtJt k

JtJQ−1 k

RtJ

kRtJ

t

krkJtJQ

−1

kkRtJ

k

2

,

JtJQ−1 k

RtJkrkJtJQ

−1

k RtJ

kJtJQ

−1

k rk ≤1kJtJQ

−1

k R tJ

k.

5.4

Introduce5.4into5.3, we obtain the following:

RtJt ks s

JtJQ−1

kkRtJ

k

2

1kJtJQ

1

k RtJ

k

JtJQ

1

kk

1kJtJQ

−1

k

RtJ k

DkkJtJQ

k

1k

Dk

RtJk.

(9)

Inequality5.1can be written as

RtJt ks sRtJ

k

≥ DkkJtJQ

k

1k

Dk

. 5.6

This condition5.6tells us at each iteration ofAlgorithm 3.1, we require that the termination condition holds. Since the condition numbers,Dk, are bounded from above, condition5.6

gives

cosθ≥ D −J

tJQ k

1D , 5.7

where θ is the angle between RtJ

k and s and D are the upper bound of the condition numbersDk. Inequality5.7shows that the acute angleθis bounded away fromπ/2.

The following lemma shows that the termination norm assumption G4 implies that the cosine of the angle between the gradient and the Krylov subspace is bounded below.

Lemma 5.2. Let the global assumptions hold. Then

Vt k

RtJ k

DkkJtJQ

k

1k

Dk

RtJ

k. 5.8

Proof. Suppose s Vky be a vector of Sk satisfying the termination condition, RtJk

JtJQ

kskRtJk, k ∈0,1. FromLemma 5.1and the fact thats V y y, we obtain

RtJt kVky

y

DkkJtJQ

k

1k

Dk

RtJ

k. 5.9

From Cauchy-Schwartz inequality, we obtain

RtJtkVkyVkt

RtJky. 5.10

Combining formulas5.9and5.10, we obtain the result.

(10)

Lemma 5.3. Let the global assumptions hold and lety be a solution of the minimization problem,

miny≤Δ ψy. Then

hkmin Δk,

hk Hk

αkRtJ

kmin Δk,

αkRtJ

k

JtJQ k

, 5.11

whereαk DkkJtJQk/1kDk.

Proof. Since,ψy φVky fk RtJtkVky 1/2VkytJtJ QVky. Thus ∇ψ0

Vt

kRtJk. From Step 3 ofAlgorithm 3.1, we obtain

fkfk1≥ξ1

fkφky

5.12

ξ1

2V t k

RtJkmin Δk,

Vt k

RtJk

Vt k

JtJQ kVk

5.13

ξ1

2hkmin Δk,

hk

Hk

. 5.14

We have to convert the lower bound of this inequality to one involvingRtJ

krather than Vt

kRtJk. UsingLemma 5.2, we obtainhkαkRtJk, but,HkVktJtJQkVkJtJQ

k. Substituting in5.14, we obtain

hk

2 min Δk,

hk

Hk

αk 2 R

tJ

kmin Δk,

αkRtJ

k

JtJQ k

, 5.15

which completes the proof.

The following two facts will be used in the remainder of the proof.

Fact 1.

By Taylor’s theorem for anykandΔ>0, we have

aredkΔpredkΔ

fkfk1

htkykΔ−1 2y

t kΔVkt

JtJQkVkykΔ

1

2y t

kΔHkykΔ−

1

0

ytkΔHkηykΔ1−ηdη

ykΔ2

1

0

HkHkη1−ηdη.

(11)

where,Hkη VktJtJQxkηskVkwithskxk1−xkV yk. Thus,

aredkΔ predkΔ−1

ykΔ2

predkΔ

1

0

HkHkη1ηdη. 5.17

Fact 2.

For any sequence {xk} generated by an algorithm satisfying the global assumptions, the related sequence{fk}is monotonically decreasing and bounded from below, that is,fkf ask → ∞.

The next result establishes that every limit point of the sequence{xk}satisfies the first-order necessary conditions for a minimum.

Lemma 5.4. Let the global assumptions hold andAlgorithm 3.1 is applied to fx, generating a sequence{xk}, xk∈Ω, andk1,2, . . . ,thenhk0.

Proof. Sincek <1 for allk, we have

αk

D −JtJQ k

1D α >0 ∀k. 5.18

Consider any xl with RtJl/0, RtJxRtJlβ1xxl. Thus, if xxk

RtJ

k/2β1, thenRtJxRtJkRtJlRtJxRtJkRtJk/2. Let,

ρRtJ

k/2 andBρ{x:xxk< ρ}. There are two cases, either for allkl, xl∈ Bρor eventually{xl}leaves the ballBρ. Supposexl∈ Bρfor allkl, thenglgk/2σfor all

l > k. Thus byLemma 5.3, we have

predkΔ≥ ξ1ασ

2 min Δk,

ασ

Hk

kl. 5.19

Putασ σ, introduce inequality5.19into5.17and sinceHkJtJQkβ1, we

obtain

aredkΔ predkΔ−1

yk2

1

0HkHkη1−ηdη

ξ1σmin

Δk, σ/β1

≤ Δ2k

2β1

ξ1σmin

Δk, σ/β1

≤ 2Δ1

ξ1σ

, Δk

σ β1 ∀

kl.

5.20

This gives forΔksufficiently small andklthat

aredkΔ predkΔ

(12)

In addition, we have

JtJQ−1 k hk

hk

Hk

αRtJ

k

β1 ≥

σ β1

. 5.22

Inequalities5.21and5.22tell us that, forΔk sufficiently small, we cannot get a decrease inΔkin Step 3iiofAlgorithm 3.1. It follows thatΔkis bounded away from 0, but, since

fkfk1aredkΔ≥ξ1predkΔ≥ξ1σmin

Δk, σ

β1

, 5.23

and since f is bounded from below, we must have Δk → 0 which is a contradiction. Therefore,{xk}must be outsideBρfor somek > l.

Letxj1be the first term afterxlthat does not belong toBρ. From inequality5.23, we get

fxl

fxj1

j

kl

fkfk1

l j1

ξ1predkΔ

j

kl

ξ1σmin

Δk, σ

β1

.

5.24

IfΔkσ/β1for alllkj, we have

fxl

fxj1

ξ1σ

j

kl

Δkξ1σρ. 5.25

If there exists at least onekwithΔk> σ/β1, we get

fxm

fxm1

ξ1σ 2

β1

. 5.26

In either case, we have

fxm

fxj1

ξ1σmin

ρ, σ β1

ξ1α

RtJ k 2 min

αRtJ k 2β1

,αR

tJ k 2β1

ξα2 4β1

RtJk2.

(13)

Since, f is bounded below, {fk} is a monotonically decreasing sequence i.e., fkf. Hence,

RtJ k

2 4β1

ξ1α2

fkf

, 5.28

i.e., RtJ

k → 0 ask → ∞. The proof is completed by usingLemma 5.2.

The following lemma proves that under the global assumptions, if each member of the sequence of iterates generated byAlgorithm 3.1satisfies the termination condition of the algorithm, then this sequence converges to one of its limit points.

Lemma 5.5. Let the global assumptions hold,Hk VktJtJQxkVkfor allk,JtJQxis

Lipschitz continuous withL, andxis a limit point of the sequence{xk}withJtJQxbe positive

definite. Then{xk}converges q-Superlinearly tox.

Proof. Let{xkm}be a subsequence of the sequence{xk}that converges tox, where,x is a

limit of{xk}. FromLemma 5.4,RtJx 0. Since JtJQxis positive definite and

JtJQxis continuous there existsδ

1 >0 such that ifxx< δ1, thenJtJQxis

positive definite, and ifx /xthenRtJx/0. Let,B1{x:xx< δ1}.

SinceRtJx

0, we can findδ2>0 withδ2< δ1/4 andJtJQ−1RtJδ1/2

for allxB2{x:xx< δ2}.

Findlosuch thatfxlo <inf{fx :xB1−B2}andxloB2. Consider anyxiwith ikl, xiB2.

We claim that xi1 ∈ B2, which implies that the entire sequence beyond xkloB2.

Supposexi1does not belong toB2. Sincefi1< fklo,xi1does not belong toB1either. So

Δixi1−xi

xi1−xxix

δ1−

δ1

4

3δ1

4

> δ1

2

JtJQi−1RtJi

H−1

i hi.

5.29

So , the truncated-Newton step is within the trust region, we obtain

yi

Δi

VitJtJQiVi

−1

(14)

SinceyiΔ < δ1/2, it follows thatxi1 ∈ B1, which is a contradiction. Hence, for allk

klo, xkB2, and so sincefxkis strictly decreasing sequence andxis the unique minimizer

offinB2, we have that{xk} → x.

The following lemma establishes the rate of convergence of the sequence of iterates generated byAlgorithm 3.1to beq-super linear.

Lemma 5.6. Let the assumptions ofLemma 5.5 hold. Then the rate of convergence of the sequence {xk}isq-Super linear.

Proof. To show that the convergence rate is super linear, we will show eventually that H−1

k hkwill always be less than Δj and hence the truncated-Newton step will always be taken. SinceJtJQx

is positive definite, it follows that the convergence rate of{xk}to

xis super linear.

To show that eventually the truncated-Newton step is always shorter than the trust region radius, we need a particular lower bound on predkΔ. By assumptions, for allklarge enough,Hkis positive definite. Hence, either the truncated-Newton step is longer than the trust region radius, orykΔis the truncated-Newton step. In either case

ykΔ≤Vkt

JtJQ jVj

−1

Vt k

RtJ

kVkt

JtJQ kVk

−1

Vt k

RtJ

k, 5.31

and so it follows thatVt

kRtJkskΔ/VktJtJQkVk−1. ByLemma 5.3, for allk large enough, we have

predkΔ≥

ξ1α

2

ykΔ JtJQ−1

k

min Δk,

αRtJ k

JtJQ k

. 5.32

Thus,

predkΔ≥

ξ1α

2

ykΔ JtJQ−1

k

min ykΔ,

αykΔ

JtJQ

kJtJQ

1

k

ξ1α2

2

ykΔ2 DJtJQ−1

k

.

5.33

So, by the continuity ofJtJQx, for allklarge enough, we obtain

predkΔ≥ ξ1α

2

4

ykΔ2

JtJQ−1x

. 5.34

Finally, by the argument leading up to Fact 1 and the Lipschitz continuity assumption, we have

aredkΔ−predkΔ≤ykΔ3

L

(15)

Thus, for anyΔk>0 andklarge enough, we obtain

aredkΔ predkΔ−

1skΔ3

L

2.

4JtJQ−1x

ξ1α2ykΔ2

2L

ξ1α2

JtJQ−1xykΔ

≤ 2LJtJQ

−1

x

ξ1α2 Δk

.

5.36

Thus by Step 3iiofAlgorithm 3.1, there is aΔsuch that ifΔk−1 < Δ, thenΔkwill be less thanΔk−1only ifΔkVk−t 1JtJQk−1Vk−1−1Vk−t 1RtJk−1. It follows from the superlinear

convergence of the truncated-Newton method that forxk−1 close enough tox andk large enough,

VktJtJQkVk

−1

VktRtJk<Vk−t 1JtJQk−1Vk−1

−1

Vk−t 1RtJk−1. 5.37

Now, ifΔk is bounded away from 0 for all largek, then we are done. Otherwise, if for an arbitrarily largekkis reduced, that is,Δk<Δk−1, then we have

ΔkVk−t 1

JtJQk−1Vk−1

−1

Vk−t 1RtJk−1>VktJtJQkVk

−1

VktRtJk, 5.38

and so the full truncated-Newton step is taken. Inductively, this occurs for all subsequence iterates and super linear convergence follows.

The next result shows that under the global assumptions every limit point of the sequence{xk}satisfies the second-order necessary conditions for a minimum.

Lemma 5.7. Let the global assumptions hold, and assume predkΔ ≥ α2−λ1HΔ2 for all the

symmetric matrixH ∈ Rkj×kj,h ∈ Rkj,Δ > 0 and someα

2 > 0. If {xk} → x, then Hxis

positive semidefinite, whereλ1His the smallest eigenvalue of the matrixH.

Proof. Suppose to the contrary thatλ1Hx < 0. There existsK such that ifkK, then

λ1Hk<1/2λ1Hx. For allkKand for allΔk>0, we obtain

predkΔ≥α2

λ1

Hk

Δ2

k

α2

2

λ1

Hx

Δ2

k. 5.39

Using inequality5.17, we obtain

aredkΔ predkΔ−

1ykΔ

2

predkΔ 1

0

HkηHk1−ηdη

≤ 2ykΔ

2

Δ2

k

1

0

HkηHk1η

α2

λ1

Hx

.

(16)

Since the last quantity goes to zero asΔk → 0 and since a Newton step is never taken for

k > K, it follows from Step 3iiofAlgorithm 3.1that for kK Δk sufficiently small.Δk cannot be decreased further. Thus,Δis bounded below, but

aredkΔ≥ξpredkΔ≥ ξα2 2

λ1

Hx

Δ2

k. 5.41

Since f is bounded below, aredkΔ → 0, so Δk → 0 which is a contradiction. Hence

λ1Hx≥0. This completes the proof of the lemma.

6. Concluding Remarks

In this paper, we have shown that the implicitly restarted Arnoldi method can be combined with the Newton iteration and trust region strategy to obtain a globally convergent algorithm for solving large-scale NLS problems.

The main restriction of this scheme is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination condition. This is enough to guarantee convergence.

This theory is sufficiently general that is hold for any algorithm that projects the problem on a lower dimensional subspace. The convergence results indicate promise for this research to solve large-scale NLS problems. Our next step is to investigate the performance of this algorithm on some NLS problems. The results will be reported in our forthcoming paper.

Acknowledgment

The authors would like to thank the editor-in-chief and the anonymous referees for their comments and useful suggestions in the improvement of the final form of this paper.

References

1 G. Golub and V. Pereyra, “Separable nonlinear least squares: the variable projection method and its applications,” Inverse Problems, vol. 19, no. 2, pp. R1–R26, 2003.

2 J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, USA, 1970.

3 J. J. Mor´e, “The Levenberg-Marquardt algorithm: implementation and theory,” in Numerical Analysis (Proc. 7th Biennial Conf., Univ. Dundee, 1977), G. A. Watson, Ed., vol. 630 of Lecture Notes in Mathematics, pp. 105–116, Springer, Berlin, Germany, 1978.

4 J. E. Dennis Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall Series in Computational Mathematics, Prentice-Hall, Englewood Cliffs, NJ, USA, 1983.

5 M. Gulliksson, I. S ¨oderkvist, and P.- ˚A. Wedin, “Algorithms for constrained and weighted nonlinear least squares,” SIAM Journal on Optimization, vol. 7, no. 1, pp. 208–224, 1997.

6 M. R. Abdel-Aziz, “Globally convergent algorithm for solving large nonlinear systems of equations,” Numerical Functional Analysis and Optimization, vol. 25, no. 3-4, pp. 199–219, 2004.

7 M. J. D. Powell, “Convergence properties of a class of minimization algorithms,” in Nonlinear Programming, 2 (Proc. Sympos. Special Interest Group on Math. Programming, Univ. Wisconsin, Madison, Wis., 1974), O. Mangasarian, R. Meyer, and S. Robinson, Eds., pp. 1–27, Academic Press, New York, NY, USA, 1975.

(17)

9 J. J. Mor´e, “Recent developments in algorithms and software for trust region methods,” in Mathematical Programming: The State of the Art (Bonn, 1982), A. Bacham, M. Grotschel, and B. Korte, Eds., pp. 258–287, Springer, Berlin, Germany, 1983.

10 G. A. Shultz, R. B. Schnabel, and R. H. Byrd, “A family of trust-region-based algorithms for unconstrained minimization with strong global convergence properties,” SIAM Journal on Numerical Analysis, vol. 22, no. 1, pp. 47–67, 1985.

11 M. R. Osborne, “Some special nonlinear least squares problems,” SIAM Journal on Numerical Analysis, vol. 12, no. 4, pp. 571–592, 1975.

12 M. Xiaofang, F. R. Ying Kit, and X. Chengxian, “Stable factorized quasi-Newton methods for nonlinear least-squares problems,” Journal of Computational and Applied Mathematics, vol. 129, no. 1-2, pp. 1–14, 2001.

13 J. E. Dennis Jr., H. J. Mart´ınez, and R. A. Tapia, “Convergence theory for the structured BFGS secant method with an application to nonlinear least squares,” Journal of Optimization Theory and Applications, vol. 61, no. 2, pp. 161–178, 1989.

14 P. N. Brown and Y. Saad, “Hybrid Krylov methods for nonlinear systems of equations,” SIAM Journal on Scientific and Statistical Computing, vol. 11, no. 3, pp. 450–481, 1990.

15 D. C. Sorensen, “Minimization of a large scale quadratic function subject to an ellipsoidal constraint,” Tech. Rep. TR94-27, Department of Computational and Applied Mathematics, Rice University, Houston, Tex, USA, 1994.

16 D. C. Sorensen, “Newton’s method with a model trust region modification,” SIAM Journal on Numerical Analysis, vol. 19, no. 2, pp. 409–426, 1982.

17 H. Mukai and E. Polak, “A second-order method for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 26, no. 4, pp. 501–513, 1978.

18 Y. Saad, “Krylov subspace methods for solving large unsymmetric linear systems,” Mathematics of Computation, vol. 37, no. 155, pp. 105–126, 1981.

References

Related documents

And the objective of the research is to prove that reciprocal teaching technique can improve reading comprehension of the second year students of SMA Negeri 2

Background: A cross-sectional study to ascertain what the Singapore population would regard as material risk in the anaesthesia consent-taking process and identify demographic

The aim of this study was to investigate the degradation rate for samples of polypropylene (PP) modified with an organic pro-degradant submitted to ageing in a natural environment

Figure 32. a) Total average funding share per researcher in the top 10 Canadian universities, b) Total average share of article production per researcher in the top 10

that, in the absence of correlation between the losses borne by the company and the stock market, the profit loaded into insurance rates should be the precise amount which, less the

In terms of the optimal size of the pool, simulations show that it is convenient to have either a pool size equal to n; ^ so that individuals can bene…t from the optimal

SHOW CAUSE before this court in its courtroom at 300 South Spring Street, Los Angeles, California, on February 10, 2021, at 9:00 a.m., why a preemptory writ of mandate should

The second algorithm that has been proposed is for MOIP problems, and it adapts the existing B&amp;B idea in a systematic way to branch on Pareto candidates on