Volume 2009, Article ID 435851,17pages doi:10.1155/2009/435851
Research Article
Newton-Krylov Type Algorithm for Solving
Nonlinear Least Squares Problems
Mohammedi R. Abdel-Aziz
1and Mahmoud M. El-Alem
1, 21Department of Mathematics and Computer Science, Faculty of Science, Kuwait University, P.O. 5969, Safat 13060, Kuwait City, Kuwait
2Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt
Correspondence should be addressed to Mohammedi R. Abdel-Aziz,mr [email protected]
Received 15 December 2008; Accepted 2 February 2009
Recommended by Irena Lasiecka
The minimization of a quadratic function within an ellipsoidal trust region is an important subproblem for many nonlinear programming algorithms. When the number of variables is large, one of the most widely used strategies is to project the original problem into a small dimensional subspace. In this paper, we introduce an algorithm for solving nonlinear least squares problems. This algorithm is based on constructing a basis for the Krylov subspace in conjunction with a model trust region technique to choose the step. The computational step on the small dimensional subspace lies inside the trust region. The Krylov subspace is terminated such that the termination condition allows the gradient to be decreased on it. A convergence theory of this algorithm is presented. It is shown that this algorithm is globally convergent.
Copyrightq2009 M. R. Abdel-Aziz and M. M. El-Alem. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Nonlinear least squares NLS problems are unconstrained optimization problems with special structures. These problems arise in many aspects such as the solution of overde-termined systems of nonlinear equations, some scientific experiments, pattern recognition, and maximum likelihood estimation. For more details about these problems1. The general formulation of the NLS problem is to determine the solution x ∈ Rn that minimizes the function
fx Rx22
m
i1
Rix
2
, 1.1
whereRx R1x, R2x, . . . , Rmxt, Ri:Rn → R,1≤i≤mand m≥n.
Gauss-Newton and Levenberg-Marquart methods3. The second type is closely related to unconstrained minimization techniques4. Of course the two types of algorithms are closely related to each other5.
The presented algorithm is a Newton-Krylov type algorithm. It requires a fixed-size limited storage proportional to the size of the problem and relies only upon matrix vector product. It is based on the implicitly restarted Arnoldi methodIRAMto construct a basis for the Krylov subspace and to reduce the Jacobian into a Hessenberg matrix in conjunction with a trust region strategy to control the step on that subspace6.
Trust region methods for unconstrained minimization are blessed with both strong theoretical convergence properties and a good accurate results in practice. The trial computational step in these methods is to find an approximate minimizer of some model of the true objective function within a trust region for which a suitable norm of the correction lies inside a given bound. This restriction is known as the trust region constraint, and the bound on the norm is its radius. The radius is adjusted so that successive model problems minimized the true objective function within the trust region7.
The trust region subproblem is the problem of findingsΔso that
φsΔ min{φs:s ≤Δ}, 1.2
whereΔis some positive constant, · is the Euclidean norm inRn, and
φs gts1 2s
tHs, 1.3
whereg ∈RnandH∈Rn×nare, respectively, the gradient vector and the Hessian matrix or their approximations.
There are two different approaches to solve1.2. These approaches are based on either solving several linear systems 8 or approximating the curve sΔby a piecewise linear approximation “dogleg strategies”9. In large-scale optimization, solving linear systems is computationally expensive. Moreover, it is not clear how to define the dogleg curves when the matrixHis singular10.
Several authors have studied inexact Newton’s methods for solving NLS problems
11. Xiaofang et al. have introduced stable factorized quassi-Newton methods for solving large-scale NLS12. Dennis et al. proposed a convergence theory for structured BFGS secant method with an application for NLS 13. The Newton-Krylov method is an example of inexact Newton methods. Krylov techniques inside a Newton iteration in the context of system of equations have been proposed in 14. The recent work of Sorensen, provides an algorithm which is based on recasting the trust region subproblem into a parameterized eigenvalue problem. This algorithm provides a super linearly convergent scheme to adjust the parameter and find the optimal solution from the eigenvector of the parameterized problem, as long as the hard case does not occur15.
This contribution is organized as follows. InSection 2, we introduce the structure of the NLS problem and a general view about the theory of the trust region strategies and their convergence. The statement of the algorithm and its properties is developed in Section 3.
2. Structure of the Problem
The subspace technique plays an important role in solving the NLS problem1.1. We assume that the current iteratexkand a Krylov subspaceSk. Denote the dimension ofSkto bekjand {v1k, v2k, . . . , vkk
j }be a set of linearly independent vectors inSk. The next iteratexk1is such
that the incrementxk1−xk ∈Sk. Thus, we haveRjxkVky0, forj 1,2, . . . , mand
y ∈ Rkj, where the matrixV
k v1k, v
k
2 , . . . , v
k
kj . The second-order Taylor expansion of Rx2
2atxkis
R
xk
Jks22stQks, 2.1
whereQk ∈ Rn×n is defined byQk nj1Rjxk∇2Rjxkand the jacobian matrix isJk
Jxk ∇R1xk,∇R2xk, . . . ,∇Rmxkt. If we considers∈Sk, we get the quadratic model
qky R
xk
JkVky22ytAky, 2.2
whereAk∈Rkj,kjapproximates the reduced matrixVktQkVkmj1RjxkVkt∇2RjxkVk. The first order necessary conditions of2.1is
JktJkQk
sRtkJk0. 2.3
Thus, a solution of the quadratic model is a solution to an equation of the form
JktJkQk
s−RtkJk. 2.4
The model trust region algorithm generates a sequence of pointsxk, and at thekth stage of the iteration the quadratic model of problem1.2has the form
φks JktRks 1 2s
tJt kJkQk
s. 2.5
At this stage, an initial value for the trust region radius Δk is also available. An inner iteration is then performed which consists of using the current trust region radius,Δk, and the information contained in the quadratic model to compute a step,sΔk. Then a comparison of the actual reduction of the objective function
aredkΔk
Rxk22−R
xksk
Δk22, 2.6
and the reduction predicted by the quadratic model
predk
Δk
is performed. If there is a satisfactory reduction, then the step can be taken, or a possibly larger trust region used. If not, then the trust region is reduced and the inner iteration is repeated. For now, we leave unspecified what algorithm is used to form the stepsΔ, and how the trust region radiusΔis changed. We also leave unspecified the selection ofJktJkQkexcept to restrict it to be symmetric. Details on these points will be addressed in our forth coming paper.
The solutionsof the quadratic model2.2is a solution to an equation of the form
JtJQμIs−RtJ, 2.8
withμ≥0,μsts−Δ2 0 andJtJQ μIis positive semidefinite. This represents the first-order necessary conditions concerning the pairμands, wheresis a solution to model
2.2andμis the Lagrange multiplier associated with the constraintsts≤Δ2. The sufficient
conditions that will ensuresto be a solution to2.2can be established in the following lemma which has a proof in16.
Lemma 2.1. Letμ∈Rands∈RnsatisfyJtJQ μIs−RtJwithJtJQ μIpositive
semidefinite, if
μ0, s ≤Δthenssolves2.2,
s Δ, thenssolves min{φs:s Δ},
μ≥0 withs Δ, thenssolves2.2.
2.9
Of course, ifJtJQ μIis positive definite thensis a unique solution.
The main result which is used to prove that the sequence of gradients tends to zero for modified Newton methods is
f−φs≥ 1
2R
tJmin Δ, RtJ
JtJQ
, 2.10
where s is a solution to 2.2. The geometric interpretation of this inequality is that for a quadratic function, any solution s to2.2 produces a decrease f −φsthat is at least as much as the decrease along the steepest descent direction−RtJ. A proof of this result may be found in17.
3. Algorithmic Framework
The desired properties of this algorithm are:
1the algorithm should be well defined for a sufficiently general class of functions and it is globally convergent;
2the algorithm should be invariant under linear affine scalings of the variables, that is, if we replace fxby fy fsV y, where V ∈ Rn×k is an orthonormal matrix,x ∈ Rnandy ∈Rk, then applying the iteration tofwith initial guessy0
satisfyingx0 sV y0should produce a sequence{yl}related to the sequence{xl} byxl V yls, wherexl is produced by applying the algorithm tof with initial guessx0;
3the algorithm should provide a decrease that is at least as large as a given multiple of the minimum decrease that would be provided by a quadratic search along the steepest descent direction;
4the algorithm should give as good a decrease of the quadratic model as a direction of the negative gradient when the Hessian,JtJQ, is indefinite and should force the direction to be equal to the Newton direction when JtJ Q is symmetric positive definite.
The following describes a full iteration of a truncated Newton type method. Some of the previous characteristics will be obvious and the other ones will be proved in the next section.
Algorithm 3.1.
1Step 0initialization.
Givenx0∈Rn, computeRtJx0,JtJQx0.
Chooseξ1< ξ2,η1∈0,1, η2 >1,Δ1>0, k1.
2Step 1construction a basis for the Krylov subspace.
aChoose ∈ 0,1, an initial guess s0 which can be chosen to be zero, form
r0 −RtJ−JtJQs0, whereRtJ RtJxn,JtJQ JtJQxn, and computeτr0andv1 r0/τ.
bForj 1,2, . . .until convergence, do.
iFormJtJQv
jand orthogonalize it against the previousv1, v2, . . . , vjthen
hi, j JtJQv
j, vi, i1,2, . . . , j,
vj1 JtJQvj−
j
i1hi, jvi,
hi1, j vj1.
vj1vj1/hi1, j.
iiCompute the residual norm βj RtJ JtJ Qsjof the solution sj that would be obtained if we stopped.
iiiIfβj ≤jRtJ, wherej ≤1/Dj ∈0, , andDjis an approximation to the condition number ofJtJQ
3Step 2compute the trial step.
iConstruct the local model,
ψky fkhtky 1 2y
tH
ky, 3.1
wherehkVktRtJk∈RkjandHkVktJtJQkVk∈Rkj×kj.
iiCompute the solutionyto the problem min{ψky :y ≤ Δk}and set sk
Vky.
4Step 3test the step and updateΔk.
iEvaluatefk1fxkVky, aredkfk−fxkVkyand predkfk−ψVky.
iiIfaredk/predk< ξ1then
beginΔkη1Δkgo to 3i
Comment: Do not accept the step “y”, reduce the trust region radius “Δk” and compute a new step
Else ifξ1≤aredk/predk≤ξ2then
xk1xkVky
Comment: Accept the step and keep the previous trust region radius the same Elsearedk/predk> ξ2then
Δkη2Δk
Comment: Accept the step and increase the trust region radius end if
5Step 4.
Setk1←kand go to Step 1.
4. The Restarting Mechanism
In this section, we discuss the important role of the restarting mechanism to control the possibility of the failure of nonsingularity of the Hessian matrix, and introduce the assumptions under which we prove the global convergence.
Let the sequence of the iterates generated byAlgorithm 3.1be{xk}; for such sequence we assume
G1for allk, xkandxksk∈ΩwhereΩ⊂Rnis a convex set;
G2f ∈C2ΩandRtJ ∇fx/0 for allx∈Ω;
G3fxis bounded in norm inΩandJtJQis nonsingular for allx∈Ω;
G4for allsksuch thatxksk∈Ω, the termination condition,RtJk JTJQsk ≤
jRtJk,k∈0,1is satisfied;
An immediate consequence of the global assumptions is that the Hessian matrix
JtJQxis bounded, that is, there exists a constantβ
1 >0 such thatJtJQx< β1.
Assumption G3 thatJtJQis nonsingular is necessary since the IRA iteration is only guaranteed to converge for nonsingular systems. We first discuss the two ways in which the IRA method can breakdown. This can happen if eithervk1 0 so thatvk1cannot be formed,
or ifHkmax is singular which means that the maximum number of IRA steps has been taken,
but the final iterate cannot be formed. The first situation has often been referred to as a soft breakdown, sincevk1 0 impliesJtJQkis nonsingular andxkis the solution18. The second case is more serious in that it causes a convergence failure of the Arnoldi process. One possible recourse is to hope thatJtJQkis nonsingular for somekamong 1,2, . . . , kmax−1.
If suchkexists, we can computexjand then restart the algorithm usingxkas the new initial guessx0. It may not be possible to do this.
To handle the problem of singularHk, we can use a technique similar to that done in the full dimensional space. IfHkis singular, or its condition number is greater than 1/
√
ν, whereνis the machine epsilon, then we perturb the quadratic modelψyto
ψy ψy 1
2γy
tyfhty1 2y
tH kγI
y, 4.1
whereγ √nνHk1. The condition number ofHkγIis roughly 1/√ν. For more details about this technique see reference4.
We have discussed the possibility of stagnation in the linear IRA method which results in a break down in the nonlinear iteration. Sufficient conditions under which stagnation of the linear iteration never occurs are
1we have to ensure that the steepest descent direction belongs to the subspace,Sk. Indeed in Algorithm 3.1, the Krylov subspace will contain the steepest descent direction becausev1−RtJ/RtJand the Hessian is nonsingular;
2there is no difficulty, if one required the Hessian matrix,JtJQxto be positive definite for all x ∈ Ω, then the termination condition can be satisfied, and so convergence of the sequence of iterates will follow. This will be the case for some problems, but, requiring JtJ Qx to be positive definite every where is very restrictive.
One of the main restrictions of most of the Newton-Krylov schemes is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination conditionassumption G4. This condition is enough to essentially guarantee convergence of the trust region algorithm. Practically, the main difficulty is that one does not know in advance if the subspace chosen for projection will be good enough to guarantee this condition. Thus, kj can be chosen as large as the termination condition will eventually be satisfied, but, whenkj is too large the computational cost and the storage become too high. An alternative to that is to restart the algorithm keepingJtJQnonsingular andk
j< n. Moreover, preconditioning and scaling are essential for the successful application of these schemes19.
5. Global Convergence Analysis
ones arise from the fact that a lower dimensional quadratic model is used, rather than the full dimension model that is used in the preexisting results10.
Lemma 5.1. Let the global assumptions hold. Then for anys∈Ω, one has
RtJt ks s2 ≥
Dk−kJtJQ
k
1k
Dk
RtJk>0. 5.1
Proof. Suppose rk be the residual associated with s so that rk RtJk JtJQks and
RtJ
k/0. Thenrk ≤kRtJk, ands−JtJQ−
1
k RtJk−rk. So,
RtJt ks
RtJt k
−JtJQ−1 k
RtJ k−rk
−RtJt k
JtJQ−1 k
RtJ k
RtJt krk.
5.2
Hence,
RtJt ks
s
RtJt k
JtJQ−1 k
RtJ k−
RtJt krk
JtJQ−1 k
RtJ
k−rk
≥
RtJtkJtJQ−k1RtJk−RtJtkrk
JtJQ−1 k
RtJ
k−rk
.
5.3
Next,rk ≤RtJkimplies|RtJtkrkj| ≤kRtJk2which gives
RtJt k
JtJQ−1 k
RtJ
k−RtJ
t
krk≥JtJQ
−1
k −kRtJ
k
2
,
JtJQ−1 k
RtJk−rk≤JtJQ
−1
k RtJ
kJtJQ
−1
k rk ≤1kJtJQ
−1
k R tJ
k.
5.4
Introduce5.4into5.3, we obtain the following:
RtJt ks s ≥
JtJQ−1
k −kRtJ
k
2
1kJtJQ
−1
k RtJ
k
≥ JtJQ
−1
k −k
1kJtJQ
−1
k
RtJ k
Dk−kJtJQ
k
1k
Dk
RtJk.
Inequality5.1can be written as
RtJt ks sRtJ
k
≥ Dk−kJtJQ
k
1k
Dk
. 5.6
This condition5.6tells us at each iteration ofAlgorithm 3.1, we require that the termination condition holds. Since the condition numbers,Dk, are bounded from above, condition5.6
gives
cosθ≥ D −J
tJQ k
1D , 5.7
where θ is the angle between RtJ
k and s and D are the upper bound of the condition numbersDk. Inequality5.7shows that the acute angleθis bounded away fromπ/2.
The following lemma shows that the termination norm assumption G4 implies that the cosine of the angle between the gradient and the Krylov subspace is bounded below.
Lemma 5.2. Let the global assumptions hold. Then
Vt k
RtJ k≥
Dk−kJtJQ
k
1k
Dk
RtJ
k. 5.8
Proof. Suppose s Vky be a vector of Sk satisfying the termination condition, RtJk
JtJQ
ks ≤ kRtJk, k ∈0,1. FromLemma 5.1and the fact thats V y y, we obtain
RtJt kVky
y ≥
Dk−kJtJQ
k
1k
Dk
RtJ
k. 5.9
From Cauchy-Schwartz inequality, we obtain
RtJtkVky≤Vkt
RtJky. 5.10
Combining formulas5.9and5.10, we obtain the result.
Lemma 5.3. Let the global assumptions hold and lety be a solution of the minimization problem,
miny≤Δ ψy. Then
hkmin Δk,
hk Hk
≥αkRtJ
kmin Δk,
αkRtJ
k
JtJQ k
, 5.11
whereαk Dk−kJtJQk/1kDk.
Proof. Since,ψy φVky fk RtJtkVky 1/2VkytJtJ QVky. Thus ∇ψ0
Vt
kRtJk. From Step 3 ofAlgorithm 3.1, we obtain
fk−fk1≥ξ1
fk−φky
5.12
≥ ξ1
2V t k
RtJkmin Δk,
Vt k
RtJk
Vt k
JtJQ kVk
5.13
ξ1
2hkmin Δk,
hk
Hk
. 5.14
We have to convert the lower bound of this inequality to one involvingRtJ
krather than Vt
kRtJk. UsingLemma 5.2, we obtainhk ≥αkRtJk, but,HkVktJtJQkVk ≤ JtJQ
k. Substituting in5.14, we obtain
hk
2 min Δk,
hk
Hk
≥ αk 2 R
tJ
kmin Δk,
αkRtJ
k
JtJQ k
, 5.15
which completes the proof.
The following two facts will be used in the remainder of the proof.
Fact 1.
By Taylor’s theorem for anykandΔ>0, we have
aredkΔ−predkΔ
fk−fk1
−
−htkykΔ−1 2y
t kΔVkt
JtJQkVkykΔ
1
2y t
kΔHkykΔ−
1
0
ytkΔHkηykΔ1−ηdη
≤ykΔ2
1
0
Hk−Hkη1−ηdη.
where,Hkη VktJtJQxkηskVkwithskxk1−xkV yk. Thus,
aredkΔ predkΔ−1
≤
ykΔ2
predkΔ
1
0
Hk−Hkη1−ηdη. 5.17
Fact 2.
For any sequence {xk} generated by an algorithm satisfying the global assumptions, the related sequence{fk}is monotonically decreasing and bounded from below, that is,fk → f ask → ∞.
The next result establishes that every limit point of the sequence{xk}satisfies the first-order necessary conditions for a minimum.
Lemma 5.4. Let the global assumptions hold andAlgorithm 3.1 is applied to fx, generating a sequence{xk}, xk∈Ω, andk1,2, . . . ,thenhk → 0.
Proof. Sincek≤ <1 for allk, we have
αk≥
D −JtJQ k
1D α >0 ∀k. 5.18
Consider any xl with RtJl/0, RtJx− RtJl ≤ β1x− xl. Thus, if x− xk ≤
RtJ
k/2β1, thenRtJx ≥ RtJkRtJl − RtJx−RtJk ≥ RtJk/2. Let,
ρRtJ
k/2 andBρ{x:x−xk< ρ}. There are two cases, either for allk≥l, xl∈ Bρor eventually{xl}leaves the ballBρ. Supposexl∈ Bρfor allk≥l, thengl ≥ gk/2σfor all
l > k. Thus byLemma 5.3, we have
predkΔ≥ ξ1ασ
2 min Δk,
ασ
Hk
∀k≥l. 5.19
Putασ σ, introduce inequality5.19into5.17and sinceHk ≤ JtJQk ≤ β1, we
obtain
aredkΔ predkΔ−1
≤
yk2
1
0Hk−Hkη1−ηdη
ξ1σmin
Δk, σ/β1
≤ Δ2k
2β1
ξ1σmin
Δk, σ/β1
≤ 2Δkβ1
ξ1σ
, Δk≤
σ β1 ∀
k≥l.
5.20
This gives forΔksufficiently small andk≥lthat
aredkΔ predkΔ
In addition, we have
JtJQ−1 k hk≥
hk
Hk
≥ αRtJ
k
β1 ≥
σ β1
. 5.22
Inequalities5.21and5.22tell us that, forΔk sufficiently small, we cannot get a decrease inΔkin Step 3iiofAlgorithm 3.1. It follows thatΔkis bounded away from 0, but, since
fk−fk1aredkΔ≥ξ1predkΔ≥ξ1σmin
Δk, σ
β1
, 5.23
and since f is bounded from below, we must have Δk → 0 which is a contradiction. Therefore,{xk}must be outsideBρfor somek > l.
Letxj1be the first term afterxlthat does not belong toBρ. From inequality5.23, we get
fxl
−fxj1
j
kl
fk−fk1
≥l j1
ξ1predkΔ
≥ j
kl
ξ1σmin
Δk, σ
β1
.
5.24
IfΔk≤σ/β1for alll≤k≤j, we have
fxl
−fxj1
≥ξ1σ
j
kl
Δk≥ξ1σρ. 5.25
If there exists at least onekwithΔk> σ/β1, we get
fxm
−fxm1
≥ξ1σ 2
β1
. 5.26
In either case, we have
fxm
−fxj1
≥ξ1σmin
ρ, σ β1
ξ1α
RtJ k 2 min
αRtJ k 2β1
,αR
tJ k 2β1
≥ ξα2 4β1
RtJk2.
Since, f is bounded below, {fk} is a monotonically decreasing sequence i.e., fk → f. Hence,
RtJ k
2≤ 4β1
ξ1α2
fk−f
, 5.28
i.e., RtJ
k → 0 ask → ∞. The proof is completed by usingLemma 5.2.
The following lemma proves that under the global assumptions, if each member of the sequence of iterates generated byAlgorithm 3.1satisfies the termination condition of the algorithm, then this sequence converges to one of its limit points.
Lemma 5.5. Let the global assumptions hold,Hk VktJtJQxkVkfor allk,JtJQxis
Lipschitz continuous withL, andx∗is a limit point of the sequence{xk}withJtJQx∗be positive
definite. Then{xk}converges q-Superlinearly tox.
Proof. Let{xkm}be a subsequence of the sequence{xk}that converges tox, where,x is a
limit of{xk}. FromLemma 5.4,RtJx 0. Since JtJQxis positive definite and
JtJQxis continuous there existsδ
1 >0 such that ifx−x< δ1, thenJtJQxis
positive definite, and ifx /xthenRtJx/0. Let,B1{x:x−x< δ1}.
SinceRtJx
0, we can findδ2>0 withδ2< δ1/4 andJtJQ−1RtJ ≤δ1/2
for allx∈B2{x:x−x< δ2}.
Findlosuch thatfxlo <inf{fx :x ∈B1−B2}andxlo ∈ B2. Consider anyxiwith i≥kl, xi∈B2.
We claim that xi1 ∈ B2, which implies that the entire sequence beyond xklo ∈ B2.
Supposexi1does not belong toB2. Sincefi1< fklo,xi1does not belong toB1either. So
Δi≥xi1−xi
≥xi1−x−xi−x
≥δ1−
δ1
4
3δ1
4
> δ1
2
≥JtJQi−1RtJi
≥H−1
i hi.
5.29
So , the truncated-Newton step is within the trust region, we obtain
yi
Δi
−VitJtJQiVi
−1
SinceyiΔ < δ1/2, it follows thatxi1 ∈ B1, which is a contradiction. Hence, for allk ≥
klo, xk∈B2, and so sincefxkis strictly decreasing sequence andxis the unique minimizer
offinB2, we have that{xk} → x.
The following lemma establishes the rate of convergence of the sequence of iterates generated byAlgorithm 3.1to beq-super linear.
Lemma 5.6. Let the assumptions ofLemma 5.5 hold. Then the rate of convergence of the sequence {xk}isq-Super linear.
Proof. To show that the convergence rate is super linear, we will show eventually that H−1
k hkwill always be less than Δj and hence the truncated-Newton step will always be taken. SinceJtJQx
is positive definite, it follows that the convergence rate of{xk}to
xis super linear.
To show that eventually the truncated-Newton step is always shorter than the trust region radius, we need a particular lower bound on predkΔ. By assumptions, for allklarge enough,Hkis positive definite. Hence, either the truncated-Newton step is longer than the trust region radius, orykΔis the truncated-Newton step. In either case
ykΔ≤Vkt
JtJQ jVj
−1
Vt k
RtJ
k≤Vkt
JtJQ kVk
−1
Vt k
RtJ
k, 5.31
and so it follows thatVt
kRtJk ≥ skΔ/VktJtJQkVk−1. ByLemma 5.3, for allk large enough, we have
predkΔ≥
ξ1α
2
ykΔ JtJQ−1
k
min Δk,
αRtJ k
JtJQ k
. 5.32
Thus,
predkΔ≥
ξ1α
2
ykΔ JtJQ−1
k
min ykΔ,
αykΔ
JtJQ
kJtJQ
−1
k
≥ ξ1α2
2
ykΔ2 DJtJQ−1
k
.
5.33
So, by the continuity ofJtJQx, for allklarge enough, we obtain
predkΔ≥ ξ1α
2
4
ykΔ2
JtJQ−1x
. 5.34
Finally, by the argument leading up to Fact 1 and the Lipschitz continuity assumption, we have
aredkΔ−predkΔ≤ykΔ3
L
Thus, for anyΔk>0 andklarge enough, we obtain
aredkΔ predkΔ−
1≤skΔ3
L
2.
4JtJQ−1x
ξ1α2ykΔ2
2L
ξ1α2
JtJQ−1xykΔ
≤ 2LJtJQ
−1
x
ξ1α2 Δk
.
5.36
Thus by Step 3iiofAlgorithm 3.1, there is aΔsuch that ifΔk−1 < Δ, thenΔkwill be less thanΔk−1only ifΔk≥ Vk−t 1JtJQk−1Vk−1−1Vk−t 1RtJk−1. It follows from the superlinear
convergence of the truncated-Newton method that forxk−1 close enough tox andk large enough,
VktJtJQkVk
−1
VktRtJk<Vk−t 1JtJQk−1Vk−1
−1
Vk−t 1RtJk−1. 5.37
Now, ifΔk is bounded away from 0 for all largek, then we are done. Otherwise, if for an arbitrarily largek,Δkis reduced, that is,Δk<Δk−1, then we have
Δk≥Vk−t 1
JtJQk−1Vk−1
−1
Vk−t 1RtJk−1>VktJtJQkVk
−1
VktRtJk, 5.38
and so the full truncated-Newton step is taken. Inductively, this occurs for all subsequence iterates and super linear convergence follows.
The next result shows that under the global assumptions every limit point of the sequence{xk}satisfies the second-order necessary conditions for a minimum.
Lemma 5.7. Let the global assumptions hold, and assume predkΔ ≥ α2−λ1HΔ2 for all the
symmetric matrixH ∈ Rkj×kj,h ∈ Rkj,Δ > 0 and someα
2 > 0. If {xk} → x, then Hxis
positive semidefinite, whereλ1His the smallest eigenvalue of the matrixH.
Proof. Suppose to the contrary thatλ1Hx < 0. There existsK such that ifk ≥ K, then
λ1Hk<1/2λ1Hx. For allk≥Kand for allΔk>0, we obtain
predkΔ≥α2
−λ1
Hk
Δ2
k≥
α2
2
−λ1
Hx
Δ2
k. 5.39
Using inequality5.17, we obtain
aredkΔ predkΔ−
1≤ ykΔ
2
predkΔ 1
0
Hkη−Hk1−ηdη
≤ 2ykΔ
2
Δ2
k
1
0
Hkη−Hk1−ηdη
α2
−λ1
Hx
.
Since the last quantity goes to zero asΔk → 0 and since a Newton step is never taken for
k > K, it follows from Step 3iiofAlgorithm 3.1that for k ≥ K Δk sufficiently small.Δk cannot be decreased further. Thus,Δis bounded below, but
aredkΔ≥ξpredkΔ≥ ξα2 2
−λ1
Hx
Δ2
k. 5.41
Since f is bounded below, aredkΔ → 0, so Δk → 0 which is a contradiction. Hence
λ1Hx≥0. This completes the proof of the lemma.
6. Concluding Remarks
In this paper, we have shown that the implicitly restarted Arnoldi method can be combined with the Newton iteration and trust region strategy to obtain a globally convergent algorithm for solving large-scale NLS problems.
The main restriction of this scheme is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination condition. This is enough to guarantee convergence.
This theory is sufficiently general that is hold for any algorithm that projects the problem on a lower dimensional subspace. The convergence results indicate promise for this research to solve large-scale NLS problems. Our next step is to investigate the performance of this algorithm on some NLS problems. The results will be reported in our forthcoming paper.
Acknowledgment
The authors would like to thank the editor-in-chief and the anonymous referees for their comments and useful suggestions in the improvement of the final form of this paper.
References
1 G. Golub and V. Pereyra, “Separable nonlinear least squares: the variable projection method and its applications,” Inverse Problems, vol. 19, no. 2, pp. R1–R26, 2003.
2 J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, USA, 1970.
3 J. J. Mor´e, “The Levenberg-Marquardt algorithm: implementation and theory,” in Numerical Analysis (Proc. 7th Biennial Conf., Univ. Dundee, 1977), G. A. Watson, Ed., vol. 630 of Lecture Notes in Mathematics, pp. 105–116, Springer, Berlin, Germany, 1978.
4 J. E. Dennis Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall Series in Computational Mathematics, Prentice-Hall, Englewood Cliffs, NJ, USA, 1983.
5 M. Gulliksson, I. S ¨oderkvist, and P.- ˚A. Wedin, “Algorithms for constrained and weighted nonlinear least squares,” SIAM Journal on Optimization, vol. 7, no. 1, pp. 208–224, 1997.
6 M. R. Abdel-Aziz, “Globally convergent algorithm for solving large nonlinear systems of equations,” Numerical Functional Analysis and Optimization, vol. 25, no. 3-4, pp. 199–219, 2004.
7 M. J. D. Powell, “Convergence properties of a class of minimization algorithms,” in Nonlinear Programming, 2 (Proc. Sympos. Special Interest Group on Math. Programming, Univ. Wisconsin, Madison, Wis., 1974), O. Mangasarian, R. Meyer, and S. Robinson, Eds., pp. 1–27, Academic Press, New York, NY, USA, 1975.
9 J. J. Mor´e, “Recent developments in algorithms and software for trust region methods,” in Mathematical Programming: The State of the Art (Bonn, 1982), A. Bacham, M. Grotschel, and B. Korte, Eds., pp. 258–287, Springer, Berlin, Germany, 1983.
10 G. A. Shultz, R. B. Schnabel, and R. H. Byrd, “A family of trust-region-based algorithms for unconstrained minimization with strong global convergence properties,” SIAM Journal on Numerical Analysis, vol. 22, no. 1, pp. 47–67, 1985.
11 M. R. Osborne, “Some special nonlinear least squares problems,” SIAM Journal on Numerical Analysis, vol. 12, no. 4, pp. 571–592, 1975.
12 M. Xiaofang, F. R. Ying Kit, and X. Chengxian, “Stable factorized quasi-Newton methods for nonlinear least-squares problems,” Journal of Computational and Applied Mathematics, vol. 129, no. 1-2, pp. 1–14, 2001.
13 J. E. Dennis Jr., H. J. Mart´ınez, and R. A. Tapia, “Convergence theory for the structured BFGS secant method with an application to nonlinear least squares,” Journal of Optimization Theory and Applications, vol. 61, no. 2, pp. 161–178, 1989.
14 P. N. Brown and Y. Saad, “Hybrid Krylov methods for nonlinear systems of equations,” SIAM Journal on Scientific and Statistical Computing, vol. 11, no. 3, pp. 450–481, 1990.
15 D. C. Sorensen, “Minimization of a large scale quadratic function subject to an ellipsoidal constraint,” Tech. Rep. TR94-27, Department of Computational and Applied Mathematics, Rice University, Houston, Tex, USA, 1994.
16 D. C. Sorensen, “Newton’s method with a model trust region modification,” SIAM Journal on Numerical Analysis, vol. 19, no. 2, pp. 409–426, 1982.
17 H. Mukai and E. Polak, “A second-order method for unconstrained optimization,” Journal of Optimization Theory and Applications, vol. 26, no. 4, pp. 501–513, 1978.
18 Y. Saad, “Krylov subspace methods for solving large unsymmetric linear systems,” Mathematics of Computation, vol. 37, no. 155, pp. 105–126, 1981.