A New Conjugancy Coefficient of Conjugate Gradient Method

(1)

Corresponding Author: Abd Al-Gafour. J. Salem, College of comp. Science & Math. Mosul University, Iraq 432

A New Conjugancy Coefficient of Conjugate Gradient Method

1_{Abd Al-Gafour. J. Salem,}1_{Ban. A. Mitras and}3_{Nazar. K. Hussein}

1_{College of comp. Science & Math. Mosul University, Iraq} 2_{College of comp. Science & Math. Tikrit University, Iraq}

Abstract: In this paper, we derived a new formula of conjugancy coefficient for conjugate gradient method. The proposed formula depend on quasi-Newton condition. The sufficient descent and the global convergence properties for the new algorithm are proved. We get a good numerical results especially for large scale optimization problems.

Key words: conjugate gradient method, conjugancy coefficient, global convergence, QN condition. INTRODUCTION

Assume that the non-linear unconstrained optimization problem

n

R

x

f

Minimum

(

),



(1) Where

f

_:

R

n

_

R

is a continuously differentiable function, the problem (1) solving by an iterative equation as:

k k k

k

x

d

x

1







(2) Where the vector

d

_k is the search direction, the scalar



_k is the step size.

The step size



_k determine by many algorithms, in exact line search the step



_k is selected as



k k



k

Min

f

x



d





arg

__0



(3) In some special cases it is possible to compute the step



_k analytically, but in the most cases it is computed to approximately minimize

f

along the ray

x

_k





d

_k

;





0

least to reduce or at

f

enough (Andrei, N., 2007) and we refer that the selects of step size



_k and the search direction

d

_k different with different line search methods.

one of the line search method is the steepest descent method and it is consider one of the standard method in optimization where the search direction in this method is stay fixed along the algorithm as

k k

g

d





and the change is in the step size and this method have a global convergence i.e. it is guarantee get a minimum of a function but in the same time it is very slowly in practice because the rate of convergence is a linear. The Newton method is one of the line search method that have a direction search

d

_k

_

_

G

1

d

_k

where 1



G

is the

n



n

matrix





2

f

 

x



1 or ( the Hessian matrix) this algorithm is a good line search if the inverse of

G

is found, but it is not active in practice if the inverse is not exist and in this case the problem is called ill-condition, to solve this problem founded a many line search method called Quasi-Newton or like Newton that assume

G

1 is found as approximately matrix and denoted

H

_k then the search direction of Quasi-Newton

d

_k





H

_k

g

_k, but these methods have many disadvantages because it is needed a large storage because we need to store matrix at each iteration as well as we need a cost of computation this two things the optimization method trying to minimize, these disadvantage was a encourage to find a new line search have a good properties i.e. find a method with low storage and a little cost computation.

The nonlinear conjugate gradient method is a very useful technique for solving large scale minimization problems and has wide applications in many fields. The Conjugate Gradient method is one of the suitable methods for solving large-scale minimization problem. The history of conjugate gradient methods begins with the seminal paper of Hestenes and Stiefel (1952) who presented an algorithm for solving symmetric, positive definite linear algebraic systems.

(2)

433

A long variety of nonlinear conjugate gradient algorithms are known (Andrei, N., 2009). Eq. (2) is the equation of nonlinear conjugate gradient where

x

₀ is the starting point



_k is the step size and

d

_k is the search direction which defined as

d

₀





g

₀ and the next direction is defined as:

k k k

k

g

d

1





1





(4) Where βk is called conjugancy coefficient gradient parameter,

s

k



x

k1



x

k and

 

k

(

k

)

k

f

x

f

x

g









, consider

.

is the Euclidean norm and

y

_k



g

_k1



g

_k.

The conjugate gradient method is a suitable approach to solving large scale minimization problems. For strictly convex quadratic objective functions, the conjugate gradient method with exact line searches has the finite convergence property. If the objective function is not a quadratic or the inexact line searches are used, the conjugate gradient method has no finite convergence property or even no global convergence property (Dai, Y. and Y. Yuan, 2000; Yuan, Y. and W. Sun, 1997).

The line search in the conjugate gradient algorithm often based on the standard Wolfe conditions (Wolfe, P., 1969; Wolfe, P., 1971).



  

k

)

T k k k k k

k

d

f

x

g

d

x

f











(5)





k Tk k

T k k

k

d

g

d

x

g









(6) Where

d

_k is supposed to be a descent direction and

0 









1

.

For some conjugate gradient algorithms stronger version of Wolfe condition are needed to ensure convergence and enhance stability.

The conjugate gradient algorithms classified to different type when



_k is different in equation (3) as follow

])

7 [

(

1 )

(

_Hestenese

_and

_Stiefel

s

y

g

y

k T k k T k HS k 





(7)

])

5 [

Re

(

1 1 )

(

_Fletcher

_and

_eves

g

k T k k T k FR

k



 



(8)

])

9 [

]

33 [

(

1 )

(

_Polak

_Ribiere

_and

_Polyak

g

y

k T k k T k PR

k









(9)

])

6 [

(

1 1 )

(

_Conjugate

_descent

d

g

k T k k T k CD

k





 



(10)

])

8 [

(

1 )

(

_Liu

_and

_Stoery

d

g

y

k T k k T k LS

k









(11)

])

4 [

(

1 1 )

(

_Dai

_and

_Yuan

s

y

g

k T k k T k DY k  





(12)

Here , we derived a new conjugancy coefficient as the following section. The New Conjugate Gradient Algorithm:

We have the quasi-ﺁNewton condition k

k k

G

s

y



(13) We multiply both sides of equation (13) by the convex combination of

y

_k

,

s

_k and we get







k



T k k T k k T k k T k k k k k k

s

y

s

G

s

y

s

y

s

G

y

)

1 (

)

1 (

)

1 (

(

*



















I

s

y

s

y

G

k T k k T k k T k k T k

_.

)

1 (

)

1 (













(14)

1 1

1  







k k N

k

G

g

(3)

434









1

.

1

 

_

_









_k k T k k T k k T k k T k N

k

I

g

s

y

s

y

s

d



(16) Multiply both sides of equation (16) by

y

_k and we get









1

 

























T _k

k k T k k T k k T k k T k N k T

k

y

g

s

y

s

y

s

d

y



(17) k T k k k T k CG k T

k

d

y

g

d

y











1 1 (18)

From (17) and (18) we have









1

 





























T _k

k k T k k T k k T k k T k k T k k k T

k

y

g

s

y

s

y

s

y

d

g

y





Let









k

T k k T k k T k k T k

s

y

s

y

s















1

be a positive scalar greater than 1, and

k

g

1 



thenwehave

1 1

]

[

_



_





T _k

k k T k k T k

k

d

y



y

g

y

g



k T k k T k k

y

d

g

y

₁

[

1 

]









₍₁₉₎

since k T k k T k HS k

y

d

g

y

1





,then the equation (19) becomes asfollow

HS k HS k

k











(20)

3. Outlines of the Proposed Algorithm:

Step(1):The initial step: We select starting point

x

₀



R

n, and we select the accuracy solution





0

is a small positive real number and we find

d

₀





g

₀,



₀



Min

ary

(

g

₀

)

, and we set

k



0

.

Step(2):The convergence test: If

g

_k





then stop and set the optimal solution is

x

_k, Else, go to step(3).

Step(3):The line search: We compute the value of



_k by Cubic method and that satisfy the Wolfe conditions in eqs. (5),(6) and go to step(4).

Step(4):Update the variables:

x

_k_1



x

_k





_k

d

_kand compute

f

(

x

_k_₁

),

g

_k_₁ and

s

_k



x

_k_1



x

_k,

y

_k



g

_k_1



g

_k.

Step(5):The search direction: We compute the scalar



_k(N) by use the equation (20) and (3) and set

k



k



1

.

4. Convergence Analysis: 4.1 Sufficient Descent Property: Theorem (4.1):

The search direction

d

_k that generated by the proposed algorithm of modified CG satisfy the descent property for all

k

, when the step size



_k satisfied the Wolfe conditions (5),(6) .

Proof: we will use the indication to prove the descent property, for

k



0

,

0

0 0 0

0





g



d

g





g



d

T _{, then we proved that the theorem is true for}

_k

_

₀

_{, now assume that the} theorem is true for any

k

i.e T _k 0

kg

d or T _k



0

k

g

(4)

435





k k T k k T k k

k

d

y

d

g

y

g

d











 

1

1 1

1 (21)

Where









k

T k k

T k

k T k k

T k

s

y

s

y

s















1

Multiply both sides of (21) by

g

_k_₁:





1 1

1 1 1

1

 

  











T _k

k k T k k T k k T k k

T

k

d

g

y

d

g

y

g

d



(22)





1 1

2 1 1

1

 













T _k

k k T k k T k k

k T

k

d

g

y

d

g

y

g

d



By use the relation

u

T

v



u

v

cos





u

T

v



u

v

(where is the angle between the vectors u and v), we have:





1 1

2 1 1

1

 













_k _k

k k

k k k

k T

k

d

g

y

d

g

y

g

d



(23)

















T_₁ _k_₁ _k_₁ 2

1

k

g

d

Since







1 

1

1 











k T k T

k

k T k T

k

s

y

s

y

s





by assumption



T_₁ _k_₁



0

k

g

d

Then the sufficient descent is satisfied. 4.2 Global Converge Property: Assumption:

We assume that:

(i) The level set

S





x



R

n

:

f

(

x

)



f

(

x

₀

)



is bounded.

(ii) In a neighborhood N of S, the function f is continuously differentiable and its gradient is Lipschitz continuous, i.e. there exists a constant L>0 such that

. , ,

) ( )

(x g y Lx y for all x y N

g     (24)

Under these assumptions on f, there exists a constant





0

such that

g

(

x

)





,

For all

x



S

. The convergence of the steepest descent method with Armijo-type search is proved under very general conditions in (Andrei, N., 2009). On the other hand, in (Dai, Y. and Y. Yuan, 2000) it is proved that, for any conjugate gradient method with strong Wolfe line search, the following general result holds. Lemma 4.2.1:

Let assumptions (i) and (ii) hold and consider any conjugate gradient method (2) and (3), where dk is a descent direction and



_k is obtained by the strong Wolfe line search. If









1 2

1

k

d

_k

(25) Then

 



k k

g

0 inf

lim

(26)

(5)

436

, )

( )) ( ) (

(g x _g y T x_y _

_

x_y 2 ₍₂₇₎

Using lemma 4.3.1 the following result can be proved. Theorem 4.2.2:

Suppose that the assumptions (i) and (ii) hold. Consider the algorithm (2), (15). If

s

_k tends to zero and there exists nonnegative constants



1

and



2

such that:

2 2

1

_k

k

s

g





,

g

_k1 2





2 s

_k (28) and f is a uniformly convex function, then

 



k

g

0 inf

lim

Proof: We have:

k T k k T k N k

y

d

g

y

₁

[

1 

]









From eq.(28) we get:

2 1

1

₍

₁

₎

)

1 (

k k k

k T k k T k N

k

s

y

g

s

g

y







_

_



_

_



(29)

But

y

k



L

s

k

,.

k N

k

_s

L









(30)

Hence,

k N k k

k

g

s

d

_₁



_₁







L

s

L

d

_k

k k

















1 (31)



 _

 

1 ₁ 2

1

k d_k





















_

_



2 1

1

k

L



(32)

Then:



 



k k

g

0 inf

lim

₁

Numerical and Results:

In order to assess the performance of the proposed algorithm. We test the proposal algorithm on (10) non-linear unconstrained test functions, for each test function we have considered numerical experiment with the number of variable n=1000,10000.

The results that we have it from test functions are obtained using (Pentium 4 computer) and the programs are written in FORTRAN90 Language and the stopping criterion taken to be

g

_k_₁



1 

10

6.

(6)

437

Table 1: Comparative Performance of the Two Algorithms for Group of Test Functions at N=1000.

Test Fuctios. FR-CG algorithm NCG algorithm

NOI NOF MIN NOI NOF MIN

Non-diagonal 148 345 1.53432E-013 34 84 2.6204E-021

Wolfe 192 385 5.265267E-014 96 193 3.972E-014

Wood 3354 17377 1.102234E-013 1043 2096 3.0329E-013

Rosen 175 440 8.234663E-014 54 135 1.8665E-015

Beale 73 150 2.6586E-013 59 127 1.3215E-016

Edgar 6 14 1.0135803E-014 5 13 5.4977E-021

Sum 31 173 7.6506322E-009 15 80 7.0856E-009

Strait 8 17 8.135E-017 13 28 5.08477E-015

Dixon 396 795 9.809334E-014 84 172 6.3905E-014

Reciep 11 30 5.675117E-015 15 42 3.6309E-021

Total 4394 19726 1418 2970

Table 2: Comparative Performance of the Two Algorithms for Group of Test Functions at N=10000.

Test Functios FR-CG algorithm NCG algorithm

NOI NOF MIN NOI NOF MIN

Non-diagonal 138 325 5.298103E-014 34 84 5.2641E-018

Wolfe 239 481 2.584548 142 285 8.0039E-014

Wood 7969 34587 1.821634E-014 1433 2872 4.0382E-013

Rosen 183 456 9.625753E-014 35 80 1.580E-014

Beale 78 160 2.5549E-013 105 210 1.390E-015

Edgar 6 14 1.013580E-013 5 13 5.6096E-020

Sum 54 258 1.368982E-008 39 196 2.5692E-010

Strait 8 17 8.135E-016 12 25 1.0319E-014

Dixon 528 1058 0.5000000 582 1280 1.99E-013

Reciep 11 30 5.680229E-014 13 39 1.786E-013

Total 9214 37386 2400 5084

REFERENCES

Andrei, N., 2007. “Scaled Conjugate Gradient Algorithms for Unconstrained Optimization”, Comp.Optim. Appl., 38: 401-416.

Andrei, N., 2009. “Hybrid Conjugate gradient Algorithm for Unconstrained Optimization”, J. Optim. Theory Appl., 141: 249-264.

Dai, Y. and Y. Yuan, 2000. “ Nonlinear Conjugate Gradient Methods”'. Shanghai Science and Technology Press, Shanghai.

Dai, Y.H., Y. Yuan, 1999. “A Nonlinear Conjugate Gradient Method with A Strong Global Convergence Property”, SIAM J. Optim., 10: 177-182.

DOI 10.1007/s 10957-008-9505-0.

Fletcher, R., C. Reeves, 1964. “Function Minimization by Conjugate Gradient”, Comput. J., 7: 149-154. Fletcher, R., C. Reeves, 1987. “Practical Methods of Optimization”, Second Edition. John Wiley and Sons, Chichester.

Hestenes, M.R,, E.L. Steifel, 1952. “Method of Conjugate Gradient for solving Linear Systems”, J. Reaserch Nat.Bur. Standards Sec., B.49: 409-436.

Liu, Y., Storey, 1991. “Efficient Generalized Conjugate Gradient Algorithms. Part 1: Theory. Journal on Optimization Theory and Applications, 69: 129-137.

Poliak, B.T., 1969. “The Conjugate Gradient Method in Extreme Problems”, URSS Comp. Math. Math. phys., 9: 94-112.

Wolfe, P., 1969. “Convergence Conditions for Ascent Methods” I , SIAM Rev., 11: 226-235.