177
A MODIFIED REGULARIZED NEWTON METHOD FOR UNCONSTRAINED NONCONVEX OPTIMIZATION
Heng Wang*, Mei Qin & Haibo Wang
College of Science, University of Shanghai for Science and Technology, Shanghai, China, 200093
*Emai
l: [email protected] ABSTRACTIn this paper, we present a modified regularized Newton method for the unconstrained nonconvex optimization by using trust region technique. We show that if the gradient and Hessian of the objective function are Lipschitz continuous, then the modified regularized Newton method (M-RNM) has a global convergence property. Numerical results show that the algorithm is very efficient.
Keywords: Regularied Newton method; Trust region method; Unconstrained nonconvex optimization; Global convergence
MSC2010: 47H99, 49M15.
1. INTRODUCTION
We consider the unconstrained optimization problem
), min f ( x
Rn x
(1.1)
where
f : R
n R
is twice continuously differentiable, whose gradient f
and Hessian
2f
are denoted by) (x
g
andH (x )
respectively. Troughout this paper, we assume that the solution set of (1.1) is nonempty and denoted byS
*, and in all cases
refers to the 2-norm.It is well known that
f (x )
is convex if and only ifH (x )
is symmetric positive semidefinite for allR
nx
. Moreover, iff (x )
is convex, thenx S
* if and only ifx
is a solution of the system of nonlinear equations0,
= ) (x
g
(1.2)Hence, we could get the minimizer of
f (x )
by solving (1.2) [1]-[3].The Newton method is one of a efficient solution method. At every iteration, it computes the trial step,
=
k1 kN
k
H g
d
(1.3)where
g
k= g ( x
k)
andH
k= H ( x
k)
. As we know, ifH
k is Lipschitz continuous and nonsingular at the solution, then the Newton method has quadratic convergence. However, this method has an obvious disadvantage when theH
k is singular or near singular.To overcome the difficulty caused by the possible singularity of
H
k, [4] proposed a regularized Newton method, where the trial step is the solution of the linear equations,
= )
( H
k
kI d g
k (1.4)where
I
is the identity matrix.
k is a positive parameter which is updated from itration to iteration. The parameter
k is introduced to overcome the difficulties caused by the singularity or near singularity of the HessianH
k.Now we need to consider another question, "how to choose the modified regularized parameter
k? " which will play important roles not only in theoretical analysis but also in numerical experiments. Yamashita and Fukushima [5] chose=
k 2k
g
and showed that the regularized Newton method has quadratic convergence under the local error bound178
condition which is weaker than nonsingularity. Fan and Yuan [6] took
k= g
k with [1,2]
and proved that the Levenberg-Marqularity method preserves the quadratic convergence under the same conditions. In this paper we will choose the parameter
k as
k=
1
k
kg
k , where
1 1
is a positive constant,)) ( (0, max
=
min kk
H
.
k> 0
is updated from iteration to iteration by a trust region technique.In most past studies [7]-[9] for the regularized Newton method, the convergence properties have been discussed only when
f
is convex. In this paper, we propose a modified Newton method for (1.1) whose objective functionf
is nonconvex, which is mainly motivated in [10]. Dong-Huili, Masao Fukushima, Liqun Qi and Nobuo Yamashita have proposed that regularized Newton and inexact Newton methods are possible extended to nonconvex minimization problems [10]. They chose
k to satisfy)), ( 2 , ( max
=
k 1 kk
C g H
(1.5)where
C > 0
is a constant and
1( H
k)
is the minimum eigenvalue ofH
k. Based on the better performance of the modified regularized method with
k= max (0,
min( H
k))
, we will consider the choice of)), ( (0, max
=
min kk
H
(1.6)in this paper.
We extend the regularized Newton method (1.4) to the unconstrained nonconvex optimization. At the
k
-th iteration of the M-RNM, we set regularized parameter
k as
k=
1
k
1g
k , where
1 1
and
k> 0
. From the definition of
k , the matrixH
k
1
kI
is positive semidefinite even iff
is nonconvex. Therefore, if 0
g
k , thenH
k
kI = H
k
1
kI
kg
kI 0
, we can use regularized Newton method to solve the problem of (1.1).The main scheme of the modified regularized Newton method for unconstrained nonconvex optimization is given as follows. At every iteration, it solves the linear equations
k k
k
I d g
H ) =
(
(1.7)to obtain the Newton step
d
k, where
k=
1
k
kg
k , and then solves the linear equationsk k k k
k
k
I d g y with y x d
H ) = ( ) =
(
(1.8)to obtain the approximate Newton step
~ d
k .The purpose of this paper is to investigate whether the proposed method for unconstrained nonconvex optimization has global convergence.
The paper is organized as follows. In section 2, we present a new modified regularized Newton algorithm by using trust region technique, then prove the global convergence. In section 3, Numerical results are given. Finally, we conclude the paper in the section 4.
2. THE ALGORITHM AND GLOBAL CONVERGENCE
In this section, we first present the new modified regularized Newton algorithm by using trust region technique, then prove the global convergence. First, we give the modified regularized Newton algorithm.
Let
d
k and~
d
k be given by (1.7) and (1.8), respectively. Since the matrix~ d
d
k
k is symmetric and positive definite,d
k is a descent direction off x
atx
k, but~ d
d
k
k may not be, Hence we prefer to use a trust region technique.179 Define the actual reduction of
f (x )
at thek
-th iteration as).
( ) (
=
~ d d x f x f
Ared
k k
k
k
k (2.1)Note that the Newton step
d
k is the minimizer of the problem: .
2 1 2
= 1
min
k,1d d
TH
kd g
kTd
kd
2Rn d
If we let
,
=
=
1,1 k k k k
k
d H I
g
then it can be proved [1] that
d
k is also a solution of the trust region problem:. .
. 2 ,
min 1
k,1T k k T Rn d
d t s d g d H
d
By the famous result given by Powell in [11], we know that
. , 2 min
) 1 ( (0)
,1,1
k k k k
k k
k
H
d g g
d
(2.2)Similar to
d
k,~
d
k is not only the minimizer of the problem: ,
2 1 2
= 1
min
k,1d d
TH
kd g y
k Td
kd
2Rn d
but also the solution of the following trust region problem:
, . . , 2
min 1
k,2T k k
T Rn d
d t s d y g d H
d
where
.
=
=
1,2 k k k k
k
H I g y
~
d
Therefore we also have
. ,
2 min ) 1 ( (0)
,2,2
k k k
k k
k
k
H
y
~ g d y
g
~
d
(2.3)Based on the inequalities (2.2) and (2.3), it is reasonable for us to define the new predicted reduction as
0 0 ,
=
Pr
,1 ,1 ,2 ,2
~
d d
ed
k
k
k k
k
k k (2.4)which satisfies
. ,
2 min , 1
2 min Pr 1
k k k
k k
k k k
k
H
y
~ g d y
H g d g g
ed
(2.5)The ratio of the actual reduction to the predicted reduction
Pr ,
=
k k
k
ed
r Ared
(2.6)plays a key role in deciding whether to accept the trial step and how to adjust the regularized parameter.
180
The regularized Newton algorithm with correction for unconstrained nonconvex optimization problems is stated as follows.
(0,0)–(13,0);
Algorithm2.1
Step 1. Given
x
0 R
n, 0
,
0> m > 0
,0 < c
0 c
1 c
2< 1
,0 < p
1< 1 < p
2 ,
1 1
,0
:=
k
.Step 2. If
g
k
, then stop. Otherwise go to step 3.Step 3. Compute
k= max 0,
min H
k
,
k=
kg
k . Solve H
k
1
kI
kI d = g
k.
(2.7)to obtain
d
k. Set.
=
k kk
x d
y
Solve
H
k
1
kI
kI d = g y
k,
(2.8)to obtain
~ d
k . Set.
=
k k
k
d d
s
Step 4. Compute
k k
k
ed
r Ared
= Pr
. Set
.
,
= ,
,
0
1
x otherwise
c r if s x x
k
k k k
k (2.9)
Step 5. Update
k1 as
.
>
, , max
, , ,
,
<
,
=
2 1
2 1 1 2
1
c r if m p
c c r if
c r if p
k k
k k
k k
k
(2.10)Set
k := k 1
and go step 2.(0,0)–(13,0);
In Algorithm 2.1, the given positive constant
m
is the lowerbound of
k. It plays the role in preventing the step from being too large when the sequence is near the minimizer set.Before discussing the global convergence of the algorithm above, we make the following assumption.
Assumption 2.2
g (x )
andH (x )
are both Lipschitz continuous, that is, there exists a constantL
1> 0
,L
2> 0
such that
181
y g x L y x x y R
ng
2 , ,
(2.11)and
y H x L
1y x , x , y R
n.
H
(2.12)It follows from (2.12) that
1 2, , .
R
ny x x y L x y x H x g y
g
(2.13)The following lemma given below shows the relationship between the positive semidefinite matrix and symmetric positive semidefinite matrix.
Lemma2.3
A
real-valued matrix is positive semidefinite if and only if A A
T /2
is positive semidefinite.Proof. See [1]
Next, we give the bounds of a positive definite matrix and its inverse.
Lemma2.4 Suppose
A
is symmetric positive semidefinite. Then,
I A
and
1
)
1( A I
hold for any
> 0
.Proof. It follows from Lemma 2.3 and the definition of the 2-norm that
,
=
) (
) ) (
(
=
)) (
) ((
=
2 max
2 max
max
I
I A
A A A
I A I A I
A
T T
T
where
max(( A I )
T( A I ))
means the largest eigenvalue of( A I )
T( A I )
. Similarly, we have.
) ) (
(
= 1
) ) ) )(
(((
=
) ) (
) ((
= ) (
1
2 min
1 max
1 max
1
I A
A A A
I A I A
I A I A I
A
T T
T T
This completes the proof.
So we have the following results.
Theorem 2.5 Under the conditions of Assumption 2.2, if
f
is bounded below, then Algorithm 2.1 terminates in finite iterations or satisfies0.
= lim inf
k k
g
(2.14)
Proof. We prove by contradiction. If the theorem is not true, then there exists a positive
and an integer~ k
such that. ,
~ k k
g
k
(2.15)182 Without loss of generality, we can suppose
= 1
~
k
. SetT = k | x
k x
k1
. Then 1,2, = T k | x
k= x
k1 .
Now we will analysis in two cases whether
T
is finite or not.Case (1):
T
is finite. Then there exists an integerk
1 such that.
=
=
=
21 1 1
1 k k
k
x x
x
By (2.9), we have
. ,
< c
0k k
1r
k
Therefore by (2.10) and (2.15), we deduce
.
,
kk
(2.16)Since
x
k1= x
k, k k
1, we get from (2.7) and (2.16) that 0.
=
k
1
k
k k 1 k
k
H I g I g
d
(2.17)From (2.8), we obtain
, 1 2
=
1 1 2 1
1 1
1 1
1 1
1 1
k
k k
k k
k k k
k k k
k k
k k k
k k k k k
k k k
k k
k k k
k
d
d g d
L
d H I g I
H g
I g I
H
d H g y g I g I
H
y g I g I
H
~ d
(2.18)
where
1 is a positive constant.It follows from (2.1) and (2.4) that
.
2 ) 1 ( ) (
) 2 (
) 1 ( ) (
)) ( (0) )
( (0) ( ) (
) (
= Pr
2 2
,2 ,2
,1 ,1
~ d o d
o
d g d H d x
f y f
~ d y g
~ d H
~ d y
f
~ d y f
~ d d
~ d d x f x f ed
Ared
k k
k T k k k T k k
k
k T k k k T
k k
k k
k k k
k k k
k k k k
k
k
(2.19)
Moreover, from (2.5),(2.15),(2.11) and (2.17), we have
2 , , 1
2 min Pr 1
2
k k
k
d
d L
ed
(2.20)for sufficiently large
k
.Duo to (2.19) and (2.20), we get
183
0,
, 2 min
1
)) ( (0) )
( (0) ( ) (
) (
Pr
= Pr 1
2 2
2
,2 ,2
,1 ,1
k k k
k
k k k
k k k
k k k k
k k k
k
d
~ d o d
o
d L
~ d d
~ d d x f x f
ed ed r Ared
(2.21)
which implies that
r
k 1
. Hence, there exists positive constant
2 such that
k
2, holds for all largek
, which contradicts to (2.16).Case (2):
T
is infinite. Then we have from (2.5) and (2.15) that
, , 2 min
, 2 min
, 1 2 min
1
Pr ))
( ) ( (
=
)) ( ) ( ( ) ( lim inf ) (
>
2 0
0
0 1
1 1
= 1
d L c
H y
~ g d y
H g d g g
c
ed c x
f x f
x f x f x
f x
f
k T
k
k k k
k k
k k k
T k
k T
k k k
T k
i i
i k k
(2.22)
which implies that
0.
lim =
, k
T k k
d
(2.23)
The above equality together with the updating rule of (2.10) means
.
k
(2.24)Similar to (2.18), it follows from (2.23) and (2.24) that
T k d
~
d
k
3 k,
for some positive constant
3. Then we have.
, ) (1
=
3d k T
~ d d
s
k k
k
k
This equality together with (2.22) yields
,
<
k T ks
which implies that
*
. x
x
k
(2.25)184 It follows from (2.7), (2.25), (2.24) and (2.18) that
0.
0,
~
d
d
k k (2.26)Since
H
k
1
kI
kg
kI d = g
k from (2.7), we have from (2.15), (2.11) and (2.26) that,
1
1 k k k 2 k 1 k k k kk k k
k
k
L d d d
g g d
g d
H
which means
.
k
(2.27)By the same analysis as (2.21) we know that
1,
r
k (2.28)Hence, there exists a positive constant
4> m
such that
k
4 holds for all sufficiently largek
, which gives a contradiction to (2.27). The proof is completed.3. NUMERICAL EXPERIMENTS
In this section, we test the performance of the modified regularized Newton method and the regularized Newton method without correction. The function to be minimized is
, ) 30 (
) 1 12 (
) 1 2 (
= 1 )
(
1 61
1
= 4
1 1
1
= 2
1 1
1
=
n i i ii i
i i n
i i
i n
i
x x x
x x
x x
f
(3.1)where
i ,
i 0 ( i = 1, , n 1)
are constants. It is clear that functionf (x )
is nonconvex and the minimizer set off (x )
is
n
n
x x x
R x
S = |
1=
2=
It can also be proved that
g (x )
provides a local error bound near the minimizer off (x )
. The Hessian)
2
( x
f
is given bywhere
(
1)
33
= 1 ) (
=
i i i
ii
a x x x
a
,(
1)
55
= 1 ) (
=
i i i
ii
b x x x
b
,c
i= c
i( x ) =
i( x
i x
i1)
2 ,4 1
) (
= ) (
=
i i i
ii
d x x x
d
,( i = 1,2, , n 1)
.185
Matrix
2f ( x )
is indefinite for allx
, but singular as the sum of every column is zero. Since the HessianH
k is always singular, the Newton method cannot be used to solve nonlinear equations (1.2). But by using the regularization technique, both regularized Newton method and Algorithm 2.1 work quite well.We tested the modified regularized Newton method for problem (1.1) where
f
is given by (3.1) with different values of
,
andn
. We setc
0= 0.001
,c
1= 0.25
,c
2= 0.75
,p
1= 0.25
,p
2= 4
,
1= 1
,2 0
= 10
andm = = 10
5 for Algorithm2.1.Table 1 reports the norms of
f ( x
k)
at every iteration whenn = 10
, = (1,1, 1)
T and = (1,1, 1)
T. Staring fromx
0= (1,2, ,10)
T, the Algorithm 2.1 only takes five iterations to obtain the minimizer off (x )
.Table 1 lists the norm of
f ( x
k)
at each iterationk
.
k 0 1 2 3 4 5
) ( x
k f 1.0628 5.55e-01 6.33e-02 2.94e-04 6.63e-10 0
Table 1: Results of M-RNM on 3.1
The results in Table 1 reflect the convergence property of
f ( x
k)
. The generated iterations arex
1= (3.2142,3. 4357,3.887 4,4.4813,5 .1523,5.84 77,6.5187, 7.1126,7.5 643,7.7858 )
Tx
2= (5.2666,5. 2742,5.315 9,5.3859,5 .4609,5.53 91,5.6161, 5.6841,5.7 258,5.7334 )
Tx
3= (5.4991,5. 4991,5.499 2,5.4995,5 .4998,5.50 02,5.5005, 5.5008,5.5 009,5.5009 )
Tx
4= (5.5000,5. 5000,5.500 0,5.5000,5 .5000,5.50 00,5.5000, 5.5000,5.5 000,5.5000 )
T. ) , 000,5.5000 5.5000,5.5
00,5.5000, .5000,5.50
0,5.5000,5 5000,5.500
(5.5000,5.
5
= x
TWe may observe that the whole sequence
x
k converges tox
*= (5,5, ,5,5)
T.We also ran the regularized Newton algorithm (RNA) [10] without correction, that is, we do not solve the linear equations (2.8) and just set the solution of (2.7) to be the trial step
s
k. Then, we tested the regularized Newton algorithm [10] without correction and modified regularized Newton algorithm for various ofn
,
,
anddifferent choices of the starting point. The results are listed in Table 2 and Table 3.
i: the selected value of
i;
i: the selected value of
i; Dim: the dimensionn
of the problem;x
0: thei
th elementx
0; md: the number of iterations required; f
: the final value of f ( x
k)
;x
*: the final value ofx
k. We use f ( x
k) 10
5 as the stopping criterion.186
i
iDim x
0md f x
*0 0
10
i 3/0 1.63e-07 5.5
i
1/ 2/0 1.85e-06 0.2929
50
i 13/1 4.28e-06 25.5
i
1/ 19/2 4.28e-06 0.09
100
i 16/1 1.62e-06 50.5
i
1/ 5/1 1.15e-08 0.519
500
i 49/6 6.31e-07 250.5
i
1/ 18/8 2.08e-08 0.0136
1 0
10 i 5/1 2.66e-06 5.5
1/ i 8/2 5.6e-12 0.2929
50 i 7/3 3.84e-10 25.5
1/ i 25/14 4.86e-09 0.09
100 i 8/2 4.95e-06 50.5
1/ i 11/5 2.30e-06 0.519
500 i 38/19 3.45e-07 250.5
1/ i 11/5 5.48e-06 0.0136
i 0
10 i 9/4 9.56e-08 5.5
1/ i 5/1 9.00e-06 0.2929
50 i 39/16 2.77e-07 25.5
1/ i 19/10 3.10e-08 0.09
100 i 59/35 4.85e-07 50.5
1/ i 19/10 2.81e-06 0.519
500 i 44/21 4.20e-07 250.5
1/ i 19/10 1.56e-06 0.0136
Table 2: Results of RNA and Algorithm 2.1.
187
i
iDim x
0md f x
*0 1
10 i 14/6 1.43e-06 5.5
1/ i 5/1 1.35e-09 0.2929
50 i 52/34 7.358e-07 25.5
1/ i 6/2 1.16e-07 0.09
100 i 121/55 7.24e-07 50.5
1/ i 5/1 4.15e-06 0.519
500
i 35/18 6.85e-07 250.5
i
1/ 6/2 5.89e-06 0.0136
1 1
10
i 5/1 1.45e-08 5.5
i
1/ 3/2 5.43e-08 0.2929
50
i 19/13 1.33e-08 25.5
i
1/ 12/5 7.56e-07 0.09
100
i 39/24 4.28e-07 50.5
i
1/ 15/7 1.70e-06 0.519
500
i 43/26 9.80e-07 250.5
i
1/ 15/7 4.54e-06 0.0136
i 1
10
i 8/4 5.37e-08 5.5
i
1/ 11/4 3.79e-10 0.2929
50
i 33/16 4.20e-09 25.5
i
1/ 17/10 3.13e-07 0.09
100
i 106/35 2.04e-07 50.5
i
1/ 28/19 1.28e-07 0.519
500 i 57/30 7.50e-07 250.5
1/ i 38/21 4.17e-07 0.0136
Table 3: Results of RNA and Algorithm 2.1.
Moreover, we can see that for the same
,
,n
andx
0. Table 2 and Table 3 show that the correction term does help to improve regularized Newton algorithm when the initial point is far away from the minimizer. These facts indicate that the introduction of correction is really useful and could accelerate the convergence of the regularized Newton method.4. CONCLUSION
We propose a modified regularized Newton method for unconstrained nonconvex optimization, and show that if the gradient and Hessian of the objective function are Lipschitz continuous, then the M-RNM has a global convergence.
The presented numerical results show that the practical of the proposed method. We may draw the conclusion that our method is better than the regularized Newton algorithm (RNM)[10] without correction.
5. REFERENCES
[1] W. Sun, Y. Yuan, Optimization Theory and Methods, Springer Science and Business Media, LLC, New York, 2006.
[2] W. Zhou, D. Li, A globally convergent BFGS method for nonlinear monotone equations without any merit functions, Math. Comp. 77 (2008) 2231-2240.
[3] C.T.Kelley, Iterative Methods for Optimization, in: Frontiers in Applied Mathematics, vol. 18, SIAM, Philadelphia, 1999.2.
[4] D. Sun, A regularization Newton method for solving nonlinear complementarity problems, Appl. Math.
188 Optim.40 (1999), pp. 315-339.
[5] D. H. Li, M. Fukushima, L. Qi, and N. Yamashita, Regularized Newton methods for convex minimization problems with singular solutions, Comput. Optim.Appl. 28 (2004), pp. 131-147.
[6] J.Y.Fan and Y.X.Yuan, On the quadratic convergence of the Levenberg-Marquardt method without nonsingularity assumption, Computing, 74 (2005), pp. 23-39.
[7] JinyanFan, YaxiangYuan, A regularized Newton method for monotone nonlinear equations and its application, Optimization Methods and Software, 29 (2014), pp. 102-119.
[8] Yamashita, N., Fukushima, On the rate of convergence of the Levenberg-Marguardt method, Comput., Suppl(Wien), 15 (2001), pp. 227-238.
[9] Polyak, R.A., Regularized Newton method for unconstrained convex optimization, Math. Program., Ser B 120 (2009), pp. 125-145.
[10] Dong-huiLi, Masao Fukushima, Liqun Qi, Nobuo Yamashita, Regularized Newton methods for convex minimization problems with singular solutions, Computational Optimization and Applications, 28 (2004), pp. 131-147.
[11] M.J.D. Powell, Convergence properties of a class of minimization algorithms, in: O.L. Mangasarian, R.R.
Meyer, S.M. Robinson (Eds.), in: Nonlinear Programming, vol. 2, Academic Press, New York, 1975, pp.
1-27.