Newton Krylov Type Algorithm for Solving Nonlinear Least Squares Problems

(1)

Volume 2009, Article ID 435851,17pages doi:10.1155/2009/435851

Research Article

Newton-Krylov Type Algorithm for Solving

Nonlinear Least Squares Problems

Mohammedi R. Abdel-Aziz

1

_{and Mahmoud M. El-Alem}

1, 2

1_{Department of Mathematics and Computer Science, Faculty of Science, Kuwait University,} P.O. 5969, Safat 13060, Kuwait City, Kuwait

2_{Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt}

Correspondence should be addressed to Mohammedi R. Abdel-Aziz,mr [email protected]

Received 15 December 2008; Accepted 2 February 2009

Recommended by Irena Lasiecka

The minimization of a quadratic function within an ellipsoidal trust region is an important subproblem for many nonlinear programming algorithms. When the number of variables is large, one of the most widely used strategies is to project the original problem into a small dimensional subspace. In this paper, we introduce an algorithm for solving nonlinear least squares problems. This algorithm is based on constructing a basis for the Krylov subspace in conjunction with a model trust region technique to choose the step. The computational step on the small dimensional subspace lies inside the trust region. The Krylov subspace is terminated such that the termination condition allows the gradient to be decreased on it. A convergence theory of this algorithm is presented. It is shown that this algorithm is globally convergent.

Copyrightq2009 M. R. Abdel-Aziz and M. M. El-Alem. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Nonlinear least squares NLS problems are unconstrained optimization problems with special structures. These problems arise in many aspects such as the solution of overde-termined systems of nonlinear equations, some scientific experiments, pattern recognition, and maximum likelihood estimation. For more details about these problems1. The general formulation of the NLS problem is to determine the solution x _∈ _Rn _{that minimizes the} function

fx Rx2₂

m

i1

Rix

2

, 1.1

whereRx R1x, R2x, . . . , Rmxt, Ri:Rn → R,1≤i≤mand m≥n.

(2)

Gauss-Newton and Levenberg-Marquart methods3. The second type is closely related to unconstrained minimization techniques4. Of course the two types of algorithms are closely related to each other5.

The presented algorithm is a Newton-Krylov type algorithm. It requires a fixed-size limited storage proportional to the size of the problem and relies only upon matrix vector product. It is based on the implicitly restarted Arnoldi methodIRAMto construct a basis for the Krylov subspace and to reduce the Jacobian into a Hessenberg matrix in conjunction with a trust region strategy to control the step on that subspace6.

Trust region methods for unconstrained minimization are blessed with both strong theoretical convergence properties and a good accurate results in practice. The trial computational step in these methods is to find an approximate minimizer of some model of the true objective function within a trust region for which a suitable norm of the correction lies inside a given bound. This restriction is known as the trust region constraint, and the bound on the norm is its radius. The radius is adjusted so that successive model problems minimized the true objective function within the trust region7.

The trust region subproblem is the problem of findingsΔso that

φsΔ min{φs:s ≤Δ}, 1.2

whereΔis some positive constant, · is the Euclidean norm inRn_{, and}

φs gt_s1 2s

t_Hs, _1.3

whereg ∈RnandH∈Rn×nare, respectively, the gradient vector and the Hessian matrix or their approximations.

There are two diﬀerent approaches to solve1.2. These approaches are based on either solving several linear systems 8 or approximating the curve sΔby a piecewise linear approximation “dogleg strategies”9. In large-scale optimization, solving linear systems is computationally expensive. Moreover, it is not clear how to define the dogleg curves when the matrixHis singular10.

Several authors have studied inexact Newton’s methods for solving NLS problems

11. Xiaofang et al. have introduced stable factorized quassi-Newton methods for solving large-scale NLS12. Dennis et al. proposed a convergence theory for structured BFGS secant method with an application for NLS 13. The Newton-Krylov method is an example of inexact Newton methods. Krylov techniques inside a Newton iteration in the context of system of equations have been proposed in 14. The recent work of Sorensen, provides an algorithm which is based on recasting the trust region subproblem into a parameterized eigenvalue problem. This algorithm provides a super linearly convergent scheme to adjust the parameter and find the optimal solution from the eigenvector of the parameterized problem, as long as the hard case does not occur15.

This contribution is organized as follows. InSection 2, we introduce the structure of the NLS problem and a general view about the theory of the trust region strategies and their convergence. The statement of the algorithm and its properties is developed in Section 3.

(3)

2. Structure of the Problem

The subspace technique plays an important role in solving the NLS problem1.1. We assume that the current iteratexkand a Krylov subspaceSk. Denote the dimension ofSkto bekjand {v₁k, v₂k, . . . , v_kk

j }be a set of linearly independent vectors inSk. The next iteratexk1is such

that the incrementxk1−xk ∈Sk. Thus, we haveRjxkVky0, forj 1,2, . . . , mand

y ∈ Rkj_{, where the matrix}_V

k v1k, v

k

2 , . . . , v

k

kj . The second-order Taylor expansion of Rx2

2atxkis

_R

xk

Jks2₂stQks, 2.1

whereQk ∈ Rn×n is defined byQk nj1Rjxk∇2Rjxkand the jacobian matrix isJk

Jxk ∇R1xk,∇R2xk, . . . ,∇Rmxkt. If we considers∈Sk, we get the quadratic model

qky R

xk

JkVky2₂ytAky, 2.2

whereAk∈Rkj,kjapproximates the reduced matrixV_ktQkVkmj1RjxkV_kt∇2RjxkVk. The first order necessary conditions of2.1is

J_ktJkQk

sRt_kJk0. 2.3

Thus, a solution of the quadratic model is a solution to an equation of the form

J_ktJkQk

s−Rt_kJk. 2.4

The model trust region algorithm generates a sequence of pointsxk, and at thekth stage of the iteration the quadratic model of problem1.2has the form

φks JktRks 1 2s

t_Jt kJkQk

s. 2.5

At this stage, an initial value for the trust region radius Δk is also available. An inner iteration is then performed which consists of using the current trust region radius,Δk, and the information contained in the quadratic model to compute a step,sΔk. Then a comparison of the actual reduction of the objective function

aredkΔk

Rxk2₂−R

xksk

Δk2₂, 2.6

and the reduction predicted by the quadratic model

predk

Δk

(4)

is performed. If there is a satisfactory reduction, then the step can be taken, or a possibly larger trust region used. If not, then the trust region is reduced and the inner iteration is repeated. For now, we leave unspecified what algorithm is used to form the stepsΔ, and how the trust region radiusΔis changed. We also leave unspecified the selection ofJ_ktJkQkexcept to restrict it to be symmetric. Details on these points will be addressed in our forth coming paper.

The solutionsof the quadratic model2.2is a solution to an equation of the form

JtJQμIs−RtJ, 2.8

withμ≥0,μst_s₋_Δ2 _{0 and}_Jt_J_Q _μI_{is positive semidefinite. This represents the} first-order necessary conditions concerning the pairμands, wheresis a solution to model

2.2andμis the Lagrange multiplier associated with the constraintst_s_≤_Δ2_{. The su}_ﬃ_cient

conditions that will ensuresto be a solution to2.2can be established in the following lemma which has a proof in16.

Lemma 2.1. Letμ∈Rands∈Rn_satisfy_Jt_J_Q _μI_s₋_Rt_J_with_Jt_J_Q _μI_positive

semidefinite, if

μ0, s ≤Δthenssolves2.2,

s Δ, thenssolves min{φs:s Δ},

μ≥0 withs Δ, thenssolves2.2.

2.9

Of course, ifJtJQ μIis positive definite thensis a unique solution.

The main result which is used to prove that the sequence of gradients tends to zero for modified Newton methods is

f−φs≥ 1

2R

t_J_min _Δ_, RtJ

Jt_J_Q

, 2.10

where s is a solution to 2.2. The geometric interpretation of this inequality is that for a quadratic function, any solution s to2.2 produces a decrease f −φsthat is at least as much as the decrease along the steepest descent direction−Rt_J_{. A proof of this result may be} found in17.

3. Algorithmic Framework

(5)

The desired properties of this algorithm are:

1the algorithm should be well defined for a suﬃciently general class of functions and it is globally convergent;

2the algorithm should be invariant under linear aﬃne scalings of the variables, that is, if we replace fxby fy fsV y, where V ∈ Rn×k is an orthonormal matrix,x ∈ Rnandy ∈Rk, then applying the iteration tofwith initial guessy0

satisfyingx0 sV y0should produce a sequence{yl}related to the sequence{xl} byxl V yls, wherexl is produced by applying the algorithm tof with initial guessx0;

3the algorithm should provide a decrease that is at least as large as a given multiple of the minimum decrease that would be provided by a quadratic search along the steepest descent direction;

4the algorithm should give as good a decrease of the quadratic model as a direction of the negative gradient when the Hessian,Jt_J_Q_{, is indefinite and should force} the direction to be equal to the Newton direction when Jt_J _Q _{is symmetric} positive definite.

The following describes a full iteration of a truncated Newton type method. Some of the previous characteristics will be obvious and the other ones will be proved in the next section.

Algorithm 3.1.

1Step 0initialization.

Givenx0∈Rn, computeRtJx0,JtJQx0.

Chooseξ1< ξ2,η1∈0,1, η2 >1,Δ1>0, k1.

2Step 1construction a basis for the Krylov subspace.

aChoose ∈ 0,1, an initial guess s0 which can be chosen to be zero, form

r0 −RtJ−JtJQs0, whereRtJ RtJxn,JtJQ JtJQxn, and computeτr0andv1 r0/τ.

bForj 1,2, . . .until convergence, do.

iFormJt_J_Q_v

jand orthogonalize it against the previousv1, v2, . . . , vjthen

hi, j Jt_J_Q_v

j, vi, i1,2, . . . , j,

vj1 JtJQvj−

j

i1hi, jvi,

hi1, j vj1.

vj1vj1/hi1, j.

iiCompute the residual norm βj RtJ JtJ Qsjof the solution sj that would be obtained if we stopped.

iiiIfβj ≤jRtJ, wherej ≤1/Dj ∈0, , andDjis an approximation to the condition number ofJt_J_Q

(6)

3Step 2compute the trial step.

iConstruct the local model,

ψky fkht_ky 1 2y

t_H

ky, 3.1

wherehkV_ktRtJk∈RkjandHkV_ktJtJQkVk∈Rkj×kj.

iiCompute the solutionyto the problem min{ψky :y ≤ Δk}and set sk

Vky.

4Step 3test the step and updateΔk.

iEvaluatefk1fxkVky, aredkfk−fxkVkyand predkfk−ψVky.

iiIfaredk/predk< ξ1then

beginΔkη1Δkgo to 3i

Comment: Do not accept the step “y”, reduce the trust region radius “Δk” and compute a new step

Else ifξ1≤aredk/predk≤ξ2then

xk1xkVky

Comment: Accept the step and keep the previous trust region radius the same Elsearedk/predk> ξ2then

Δkη2Δk

Comment: Accept the step and increase the trust region radius end if

5Step 4.

Setk1←kand go to Step 1.

4. The Restarting Mechanism

In this section, we discuss the important role of the restarting mechanism to control the possibility of the failure of nonsingularity of the Hessian matrix, and introduce the assumptions under which we prove the global convergence.

Let the sequence of the iterates generated byAlgorithm 3.1be{xk}; for such sequence we assume

G1for allk, xkandxksk∈ΩwhereΩ⊂Rnis a convex set;

G2f ∈C2_Ω_and_Rt_J _∇_f_x_/_{0 for all}_x_∈_Ω_;

G3fxis bounded in norm inΩandJt_J_Q_{is nonsingular for all}_x_∈_Ω_;

G4for allsksuch thatxksk∈Ω, the termination condition,RtJk JTJQsk ≤

jRtJk,k∈0,1is satisfied;

An immediate consequence of the global assumptions is that the Hessian matrix

Jt_J_Q_x_{is bounded, that is, there exists a constant}_β

1 >0 such thatJtJQx< β1.

(7)

Assumption G3 thatJt_J_Q_{is nonsingular is necessary since the IRA iteration is only} guaranteed to converge for nonsingular systems. We first discuss the two ways in which the IRA method can breakdown. This can happen if eithervk1 0 so thatvk1cannot be formed,

or ifHkmax is singular which means that the maximum number of IRA steps has been taken,

but the final iterate cannot be formed. The first situation has often been referred to as a soft breakdown, sincevk1 0 impliesJtJQkis nonsingular andxkis the solution18. The second case is more serious in that it causes a convergence failure of the Arnoldi process. One possible recourse is to hope thatJtJQ_kis nonsingular for somekamong 1,2, . . . , kmax−1.

If suchkexists, we can computexjand then restart the algorithm usingxkas the new initial guessx0. It may not be possible to do this.

To handle the problem of singularHk, we can use a technique similar to that done in the full dimensional space. IfHkis singular, or its condition number is greater than 1/

√

ν, whereνis the machine epsilon, then we perturb the quadratic modelψyto

ψy ψy 1

2γy

t_y_f_ht_y1 2y

t_H kγI

y, 4.1

whereγ √nνHk1. The condition number ofHkγIis roughly 1/√ν. For more details about this technique see reference4.

We have discussed the possibility of stagnation in the linear IRA method which results in a break down in the nonlinear iteration. Suﬃcient conditions under which stagnation of the linear iteration never occurs are

1we have to ensure that the steepest descent direction belongs to the subspace,Sk. Indeed in Algorithm 3.1, the Krylov subspace will contain the steepest descent direction becausev1−RtJ/RtJand the Hessian is nonsingular;

2there is no diﬃculty, if one required the Hessian matrix,Jt_J_Q_x_{to be positive} definite for all x ∈ Ω, then the termination condition can be satisfied, and so convergence of the sequence of iterates will follow. This will be the case for some problems, but, requiring Jt_J _Q_x _{to be positive definite every where is very} restrictive.

One of the main restrictions of most of the Newton-Krylov schemes is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination conditionassumption G4. This condition is enough to essentially guarantee convergence of the trust region algorithm. Practically, the main diﬃculty is that one does not know in advance if the subspace chosen for projection will be good enough to guarantee this condition. Thus, kj can be chosen as large as the termination condition will eventually be satisfied, but, whenkj is too large the computational cost and the storage become too high. An alternative to that is to restart the algorithm keepingJt_J_Q_{nonsingular and}_k

j< n. Moreover, preconditioning and scaling are essential for the successful application of these schemes19.

5. Global Convergence Analysis

(8)

ones arise from the fact that a lower dimensional quadratic model is used, rather than the full dimension model that is used in the preexisting results10.

Lemma 5.1. Let the global assumptions hold. Then for anys∈Ω, one has

_Rt_Jt ks s2 ≥

Dk−kJtJQ

k

1k

Dk

RtJ_k>0. 5.1

Proof. Suppose rk be the residual associated with s so that rk RtJk JtJQks and

Rt_J

k/0. Thenrk ≤kRtJk, ands−JtJQ−

1

k RtJk−rk. So,

Rt_Jt ks

Rt_Jt k

−Jt_J_Q−1 k

Rt_J k−rk

−Rt_Jt k

Jt_J_Q−1 k

Rt_J k

Rt_Jt krk.

5.2

Hence,

_Rt_Jt ks

s

_Rt_Jt k

Jt_J_Q−1 k

Rt_J k−

Rt_Jt krk

_Jt_J_Q−1 k

Rt_J

k−rk

≥

RtJt_kJtJQ−_k1RtJ_k−RtJt_krk

Jt_J_Q−1 k

Rt_J

k−rk

.

5.3

Next,rk ≤RtJkimplies|RtJtkrkj| ≤kRtJk2which gives

Rt_Jt k

Jt_J_Q−1 k

Rt_J

k−RtJ

t

krk≥JtJQ

−1

k −kRtJ

k

2

,

_Jt_J_Q−1 k

RtJ_k−rk≤JtJQ

−1

k RtJ

kJtJQ

−1

k rk ≤1kJtJQ

−1

k R t_J

k.

5.4

Introduce5.4into5.3, we obtain the following:

_Rt_Jt ks s ≥

_Jt_J_Q−1

k −kRtJ

k

2

1kJtJQ

₋1

k RtJ

k

≥ JtJQ

₋1

k −k

1kJtJQ

−1

k

_Rt_J k

Dk−kJtJQ

k

1k

Dk

RtJ_k.

(9)

Inequality5.1can be written as

Rt_Jt ks sRt_J

k

≥ Dk−kJtJQ

k

1k

Dk

. 5.6

This condition5.6tells us at each iteration ofAlgorithm 3.1, we require that the termination condition holds. Since the condition numbers,Dk, are bounded from above, condition5.6

gives

cosθ≥ D −J

t_J_Q k

1D , 5.7

where θ is the angle between Rt_J

k and s and D are the upper bound of the condition numbersDk. Inequality5.7shows that the acute angleθis bounded away fromπ/2.

The following lemma shows that the termination norm assumption G4 implies that the cosine of the angle between the gradient and the Krylov subspace is bounded below.

Lemma 5.2. Let the global assumptions hold. Then

Vt k

Rt_J k≥

Dk−kJtJQ

k

1k

Dk

Rt_J

k. 5.8

Proof. Suppose s Vky be a vector of Sk satisfying the termination condition, RtJk

Jt_J_Q

ks ≤ kRtJk, k ∈0,1. FromLemma 5.1and the fact thats V y y, we obtain

_Rt_Jt kVky

y ≥

Dk−kJtJQ

k

1k

Dk

_Rt_J

k. 5.9

From Cauchy-Schwartz inequality, we obtain

RtJt_kVky≤Vkt

RtJ_ky. 5.10

Combining formulas5.9and5.10, we obtain the result.

(10)

Lemma 5.3. Let the global assumptions hold and lety be a solution of the minimization problem,

min_y≤_Δ ψy. Then

hkmin Δk,

_h_k _H_k

≥αkRtJ

kmin Δk,

αkRtJ

k

_Jt_J_Q k

, 5.11

whereαk Dk−kJtJQk/1kDk.

Proof. Since,ψy φVky fk RtJtkVky 1/2VkytJtJ QVky. Thus ∇ψ0

Vt

kRtJk. From Step 3 ofAlgorithm 3.1, we obtain

fk−fk1≥ξ1

fk−φky

5.12

≥ ξ1

2V t k

RtJ_kmin Δk,

Vt k

RtJ_k

Vt k

Jt_J_Q kVk

5.13

ξ1

2hkmin Δk,

hk

Hk

. 5.14

We have to convert the lower bound of this inequality to one involvingRt_J

krather than Vt

kRtJk. UsingLemma 5.2, we obtainhk ≥αkRtJk, but,HkV_ktJtJQkVk ≤ Jt_J_Q

k. Substituting in5.14, we obtain

hk

2 min Δk,

hk

Hk

≥ αk 2 R

t_J

kmin Δk,

αkRtJ

k

Jt_J_Q k

, 5.15

which completes the proof.

The following two facts will be used in the remainder of the proof.

Fact 1.

By Taylor’s theorem for anykandΔ>0, we have

_aredk_Δ₋_predk_Δ

fk−fk1

−

−ht_kykΔ−1 2y

t kΔVkt

JtJQ_kVkykΔ

1

2y t

kΔHkykΔ−

1

0

yt_kΔHkηykΔ1−ηdη

≤ykΔ2

₁

0

Hk−Hkη1−ηdη.

(11)

where,Hkη V_ktJtJQxkηskVkwithskxk1−xkV yk. Thus,

aredkΔ predkΔ−1

≤

ykΔ2

predkΔ

1

0

_H_k₋_H_k_η₁₋_η_dη. _5.17

Fact 2.

For any sequence {xk} generated by an algorithm satisfying the global assumptions, the related sequence{fk}is monotonically decreasing and bounded from below, that is,fk → f ask → ∞.

The next result establishes that every limit point of the sequence{xk}satisfies the first-order necessary conditions for a minimum.

Lemma 5.4. Let the global assumptions hold andAlgorithm 3.1 is applied to fx, generating a sequence{xk}, xk∈Ω, andk1,2, . . . ,thenhk → 0.

Proof. Sincek≤ <1 for allk, we have

αk≥

D −Jt_J_Q k

1D α >0 ∀k. 5.18

Consider any xl with RtJl/0, RtJx− RtJl ≤ β1x− xl. Thus, if x− xk ≤

Rt_J

k/2β1, thenRtJx ≥ RtJkRtJl − RtJx−RtJk ≥ RtJk/2. Let,

ρRt_J

k/2 andBρ{x:x−xk< ρ}. There are two cases, either for allk≥l, xl∈ Bρor eventually{xl}leaves the ballBρ. Supposexl∈ Bρfor allk≥l, thengl ≥ gk/2σfor all

l > k. Thus byLemma 5.3, we have

predkΔ≥ ξ1ασ

2 min Δk,

ασ

Hk

∀k≥l. 5.19

Putασ σ, introduce inequality5.19into5.17and sinceHk ≤ JtJQk ≤ β1, we

obtain

aredkΔ predkΔ−1

≤

yk2

₁

0Hk−Hkη1−ηdη

ξ1σmin

Δk, σ/β1

≤ Δ2k

2β1

ξ1σmin

Δk, σ/β1

≤ 2Δkβ1

ξ1σ

, Δk≤

σ β1 ∀

k≥l.

5.20

This gives forΔksuﬃciently small andk≥lthat

aredkΔ predkΔ

(12)

In addition, we have

_Jt_J_Q−1 k hk≥

hk

Hk

≥ αRtJ

k

β1 ≥

σ β1

. 5.22

Inequalities5.21and5.22tell us that, forΔk suﬃciently small, we cannot get a decrease inΔkin Step 3iiofAlgorithm 3.1. It follows thatΔkis bounded away from 0, but, since

fk−fk1aredkΔ≥ξ1predkΔ≥ξ1σmin

Δk, σ

β1

, 5.23

and since f is bounded from below, we must have Δk → 0 which is a contradiction. Therefore,{xk}must be outsideBρfor somek > l.

Letxj1be the first term afterxlthat does not belong toBρ. From inequality5.23, we get

fxl

−fxj1

j

kl

fk−fk1

≥l j1

ξ1predkΔ

≥ j

kl

ξ1σmin

Δk, σ

β1

.

5.24

IfΔk≤σ/β1for alll≤k≤j, we have

fxl

−fxj1

≥ξ1σ

j

kl

Δk≥ξ1σρ. 5.25

If there exists at least onekwithΔk> σ/β1, we get

fxm

−fxm1

≥ξ1σ 2

β1

. 5.26

In either case, we have

fxm

−fxj1

≥ξ1σmin

ρ, σ β1

ξ1α

Rt_J k 2 min

αRt_J k 2β1

,αR

t_J k 2β1

≥ ξα2 4β1

RtJ_k2.

(13)

Since, f is bounded below, {fk} is a monotonically decreasing sequence i.e., fk → f. Hence,

_Rt_J k

2_≤ 4β1

ξ1α2

fk−f

, 5.28

i.e., Rt_J

k → 0 ask → ∞. The proof is completed by usingLemma 5.2.

The following lemma proves that under the global assumptions, if each member of the sequence of iterates generated byAlgorithm 3.1satisfies the termination condition of the algorithm, then this sequence converges to one of its limit points.

Lemma 5.5. Let the global assumptions hold,Hk V_ktJtJQxkVkfor allk,JtJQxis

Lipschitz continuous withL, andx∗is a limit point of the sequence{xk}withJtJQx∗be positive

definite. Then{xk}converges q-Superlinearly tox.

Proof. Let{xkm}be a subsequence of the sequence{xk}that converges tox, where,x is a

limit of{xk}. FromLemma 5.4,RtJx 0. Since JtJQxis positive definite and

Jt_J_Q_x_{is continuous there exists}_δ

1 >0 such that ifx−x< δ1, thenJtJQxis

positive definite, and ifx /xthenRtJx/0. Let,B1{x:x−x< δ1}.

SinceRt_J_x

0, we can findδ2>0 withδ2< δ1/4 andJtJQ−1RtJ ≤δ1/2

for allx∈B2{x:x−x< δ2}.

Findlosuch thatfxlo <inf{fx :x ∈B1−B2}andxlo ∈ B2. Consider anyxiwith i≥kl, xi∈B2.

We claim that xi1 ∈ B2, which implies that the entire sequence beyond xklo ∈ B2.

Supposexi1does not belong toB2. Sincefi1< fklo,xi1does not belong toB1either. So

Δi≥xi1−xi

≥xi1−x−xi−x

≥δ1−

δ1

4

3δ1

4

> δ1

2

≥JtJQ_i−1RtJ_i

≥H−1

i hi.

5.29

So , the truncated-Newton step is within the trust region, we obtain

yi

Δi

−V_itJtJQ_iVi

−1

(14)

SinceyiΔ < δ1/2, it follows thatxi1 ∈ B1, which is a contradiction. Hence, for allk ≥

klo, xk∈B2, and so sincefxkis strictly decreasing sequence andxis the unique minimizer

offinB2, we have that{xk} → x.

The following lemma establishes the rate of convergence of the sequence of iterates generated byAlgorithm 3.1to beq-super linear.

Lemma 5.6. Let the assumptions ofLemma 5.5 hold. Then the rate of convergence of the sequence {xk}isq-Super linear.

Proof. To show that the convergence rate is super linear, we will show eventually that H−1

k hkwill always be less than Δj and hence the truncated-Newton step will always be taken. SinceJt_J_Q_x

is positive definite, it follows that the convergence rate of{xk}to

xis super linear.

To show that eventually the truncated-Newton step is always shorter than the trust region radius, we need a particular lower bound on predkΔ. By assumptions, for allklarge enough,Hkis positive definite. Hence, either the truncated-Newton step is longer than the trust region radius, orykΔis the truncated-Newton step. In either case

ykΔ≤V_kt

Jt_J_Q jVj

−1

Vt k

Rt_J

k≤Vkt

Jt_J_Q kVk

−1

Vt k

Rt_J

k, 5.31

and so it follows thatVt

kRtJk ≥ skΔ/V_ktJtJQkVk−1. ByLemma 5.3, for allk large enough, we have

predkΔ≥

ξ1α

2

_y_k_Δ _Jt_J_Q−1

k

min Δk,

αRt_J k

Jt_J_Q k

. 5.32

Thus,

predkΔ≥

ξ1α

2

_y_k_Δ _Jt_J_Q−1

k

min ykΔ,

αykΔ

_Jt_J_Q

kJtJQ

₋1

k

≥ ξ1α2

2

ykΔ2 DJt_J_Q−1

k

.

5.33

So, by the continuity ofJt_J_Q_x_{, for all}_k_{large enough, we obtain}

predkΔ≥ ξ1α

2

4

ykΔ2

Jt_J_Q−1_x

. 5.34

Finally, by the argument leading up to Fact 1 and the Lipschitz continuity assumption, we have

aredkΔ−predkΔ≤ykΔ3

L

(15)

Thus, for anyΔk>0 andklarge enough, we obtain

aredkΔ predkΔ−

1≤skΔ3

L

2.

4JtJQ−1x

ξ1α2ykΔ2

2L

ξ1α2

JtJQ−1xykΔ

≤ 2LJtJQ

−1

x

ξ1α2 Δk

.

5.36

Thus by Step 3iiofAlgorithm 3.1, there is aΔsuch that ifΔk−1 < Δ, thenΔkwill be less thanΔ_k−1only ifΔk≥ V_k−t ₁JtJQk−1Vk−1−1V_k−t ₁RtJk−1. It follows from the superlinear

convergence of the truncated-Newton method that forxk−1 close enough tox andk large enough,

V_ktJtJQ_kVk

−1

V_ktRtJ_k<V_k−t ₁JtJQ_k−₁Vk−1

−1

V_k−t ₁RtJ_k−₁. 5.37

Now, ifΔk is bounded away from 0 for all largek, then we are done. Otherwise, if for an arbitrarily largek,Δkis reduced, that is,Δk<Δk−1, then we have

Δk≥V_k−t ₁

JtJQ_k−₁Vk−1

−1

V_k−t ₁RtJ_k−₁>V_ktJtJQ_kVk

−1

V_ktRtJ_k, 5.38

and so the full truncated-Newton step is taken. Inductively, this occurs for all subsequence iterates and super linear convergence follows.

The next result shows that under the global assumptions every limit point of the sequence{xk}satisfies the second-order necessary conditions for a minimum.

Lemma 5.7. Let the global assumptions hold, and assume predkΔ ≥ α2−λ1HΔ2 for all the

symmetric matrixH ∈ Rkj×kj_,_h ∈ Rkj_,Δ _> _{0 and some}_α

2 > 0. If {xk} → x, then Hxis

positive semidefinite, whereλ1His the smallest eigenvalue of the matrixH.

Proof. Suppose to the contrary thatλ1Hx < 0. There existsK such that ifk ≥ K, then

λ1Hk<1/2λ1Hx. For allk≥Kand for allΔk>0, we obtain

predkΔ≥α2

−λ1

Hk

Δ2

k≥

α2

2

−λ1

Hx

Δ2

k. 5.39

Using inequality5.17, we obtain

aredkΔ predkΔ−

1≤ ykΔ

2

_predk_Δ ₁

0

Hkη−Hk1−ηdη

≤ 2ykΔ

2

Δ2

k

₁

0

_H_k_η₋_H_k₁₋_η_dη

α2

−λ1

Hx

.

(16)

Since the last quantity goes to zero asΔk → 0 and since a Newton step is never taken for

k > K, it follows from Step 3iiofAlgorithm 3.1that for k ≥ K Δk suﬃciently small.Δk cannot be decreased further. Thus,Δis bounded below, but

aredkΔ≥ξpredkΔ≥ ξα2 2

−λ1

Hx

Δ2

k. 5.41

Since f is bounded below, aredkΔ → 0, so Δk → 0 which is a contradiction. Hence

λ1Hx≥0. This completes the proof of the lemma.

6. Concluding Remarks

In this paper, we have shown that the implicitly restarted Arnoldi method can be combined with the Newton iteration and trust region strategy to obtain a globally convergent algorithm for solving large-scale NLS problems.

The main restriction of this scheme is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination condition. This is enough to guarantee convergence.

This theory is suﬃciently general that is hold for any algorithm that projects the problem on a lower dimensional subspace. The convergence results indicate promise for this research to solve large-scale NLS problems. Our next step is to investigate the performance of this algorithm on some NLS problems. The results will be reported in our forthcoming paper.

Acknowledgment

The authors would like to thank the editor-in-chief and the anonymous referees for their comments and useful suggestions in the improvement of the final form of this paper.

References

1 G. Golub and V. Pereyra, “Separable nonlinear least squares: the variable projection method and its applications,” Inverse Problems, vol. 19, no. 2, pp. R1–R26, 2003.

2 J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, USA, 1970.

3 J. J. Mor´e, “The Levenberg-Marquardt algorithm: implementation and theory,” in Numerical Analysis (Proc. 7th Biennial Conf., Univ. Dundee, 1977), G. A. Watson, Ed., vol. 630 of Lecture Notes in Mathematics, pp. 105–116, Springer, Berlin, Germany, 1978.

4 J. E. Dennis Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall Series in Computational Mathematics, Prentice-Hall, Englewood Cliﬀs, NJ, USA, 1983.

5 M. Gulliksson, I. S ¨oderkvist, and P.- ˚A. Wedin, “Algorithms for constrained and weighted nonlinear least squares,” SIAM Journal on Optimization, vol. 7, no. 1, pp. 208–224, 1997.

6 M. R. Abdel-Aziz, “Globally convergent algorithm for solving large nonlinear systems of equations,” Numerical Functional Analysis and Optimization, vol. 25, no. 3-4, pp. 199–219, 2004.

7 M. J. D. Powell, “Convergence properties of a class of minimization algorithms,” in Nonlinear Programming, 2 (Proc. Sympos. Special Interest Group on Math. Programming, Univ. Wisconsin, Madison, Wis., 1974), O. Mangasarian, R. Meyer, and S. Robinson, Eds., pp. 1–27, Academic Press, New York, NY, USA, 1975.

(17)

9 J. J. Mor´e, “Recent developments in algorithms and software for trust region methods,” in Mathematical Programming: The State of the Art (Bonn, 1982), A. Bacham, M. Grotschel, and B. Korte, Eds., pp. 258–287, Springer, Berlin, Germany, 1983.

10 G. A. Shultz, R. B. Schnabel, and R. H. Byrd, “A family of trust-region-based algorithms for unconstrained minimization with strong global convergence properties,” SIAM Journal on Numerical Analysis, vol. 22, no. 1, pp. 47–67, 1985.

11 M. R. Osborne, “Some special nonlinear least squares problems,” SIAM Journal on Numerical Analysis, vol. 12, no. 4, pp. 571–592, 1975.

12 M. Xiaofang, F. R. Ying Kit, and X. Chengxian, “Stable factorized quasi-Newton methods for nonlinear least-squares problems,” Journal of Computational and Applied Mathematics, vol. 129, no. 1-2, pp. 1–14, 2001.

13 J. E. Dennis Jr., H. J. Mart´ınez, and R. A. Tapia, “Convergence theory for the structured BFGS secant method with an application to nonlinear least squares,” Journal of Optimization Theory and Applications, vol. 61, no. 2, pp. 161–178, 1989.

14 P. N. Brown and Y. Saad, “Hybrid Krylov methods for nonlinear systems of equations,” SIAM Journal on Scientific and Statistical Computing, vol. 11, no. 3, pp. 450–481, 1990.

15 D. C. Sorensen, “Minimization of a large scale quadratic function subject to an ellipsoidal constraint,” Tech. Rep. TR94-27, Department of Computational and Applied Mathematics, Rice University, Houston, Tex, USA, 1994.

16 D. C. Sorensen, “Newton’s method with a model trust region modification,” SIAM Journal on Numerical Analysis, vol. 19, no. 2, pp. 409–426, 1982.

17 H. Mukai and E. Polak, “A second-order method for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 26, no. 4, pp. 501–513, 1978.

18 Y. Saad, “Krylov subspace methods for solving large unsymmetric linear systems,” Mathematics of Computation, vol. 37, no. 155, pp. 105–126, 1981.