A Modified Family Of Cg-Algorithm With A New Closed-Form Line-Search Procedure

(1)

A Modified Family Of Cg-Algorithm With A New Closed-Form Line-Search Procedure

1_{Dr. Abbas Y. Al-Bayati,}2_{Asst. Prof. Dr. Maha S. Al-Salih,}3_{Lec. Muna M. Mohammed Ali}

1_{College of Basic Education Telafer / Mosul University / Iraq.} 2_{Department of Computer, College of Education Mosul University, Iraq.}

3_{Department of Operations Research, College of Computers Sciences, Mosul University, Iraq.}

Abstract: In this paper, we have proposed the 3; 4 and 5-parameters family of Conjugate Gradient (CG) methods for solving nonlinear unconstrained optimization problems proposed by (Dia&Yuan,1998); (Al-Bayati and Ahmed, 2005) and (Al-Bayati and Metras, 2008) respectively by employing a new closed-form step-size formula as a line search procedure. This modified family of CG methods, with the new line search technique, includes the already-existing practical nonlinear (CG) formulas. In this framework, we have establish a new domain to find the modified closed-form step-size formula for the implemented line search in the proposed family theoretically and the implementation of this new technique will be proposed in the second part. The global convergence property and the conjugacy conditions have been proved theoretically to the proposed family under the assumption of the modified step-size parameters.

Key wrods:

INTRODUCTION

The introduction of the CG-method by Fletcher-Reeves, in the 1960, marks the beginning if the field of large scale nonlinear optimization. These techniques that could solve very large problems, since it requires storage of only a few vectors, and could do so much more rapidly than the standard steepest descent method. The definition of a large scale problem has been changed dramatically since then, but the CG-method has remained one of the most useful techniques for solving problems of large enough to make matrix storage impractical. Numerous variants of the method of Fletcher and Reeves have been proposed over the last 40 years, and many theoretical studies have been devoted to them. The original CG-method proposed by (Fletcher and Reeves, 1964) is given by :

-1

-

FR

k k k k

d



g





d

₍₁₎

1

k k k k

x

_



x



a d

₍₂₎

where



k is a step length parameter, and where:

2 2

-1

0 for k=1

/

for k 2

FR k

k k

g



_{ }







(3)

When applied to strictly quadratic objective functions this method reduces to the linear conjugate gradient method provided



k is the exact minimizer (Fletcher, 1987). Other choices of the parameter



k in (1) also

possess this property. and give rise to distinct algorithms for nonlinear problems. Many of these variants have been studied extensively and the best choice of



k is generally believed to be :

2

1 1

(

) /

PR T

k

g

k

g

k

g

k

g

k







_ _ ₍₄₎

This is due to (Polak and Ribiere, 1969). The numerical performance of the Fletcher-Reeves method (3) is some what erratic: it is sometimes as efficient as the Polak-Ribiere method, but it is often much slower. It is safe to say that the Polak-Ribiere method is, in general substantially more efficient than the Fletcher-Reeves method. Furthermore, taking:-

1

/

T T

k

y g

k k

d y

k k





_ ₍₅₎

when

y

k



g

k1



g

k and

g

k

 

f x

(

k

)

(6)

gives the method which was originally proposed by (Hestenes and Stiefel,1952) to solve a systems of linear equations. Also considering :

2

1

/

T

k

g

k

d g

k k

(2)

this represents Dixon's formula and it denoted by (Dixon, 1972) and taking

2

1 1

/

T k

g

k

d

k

y

k



 

_ _ ₍₈₎

Represents Dai-Yuan (DY) formula (Dai & Yuan, 1999) and furthermore, taking

1

/

1 1

T T

k

g y

k k

d

k

g

k



 

_ _ _ ₍₉₎

this formula was proposed by Liu-Storey (LS) (Liu & Story, 1991).

2. Preliminaries

2.1 Line Search Methods:

In this section of line search methods, different choices of step-size



k are presented and discussed. To

perform a line search is to find a



k that reduce the objective function f, but not spend too much time tracking

it. We deal is to find the step-size



k which perform a line search that minimizes the objective function, know

as exact line search. Let us consider the following unconstrained minimization problem:

min ( )

n

x R

f x

 (10)

where f is a differentiable objective function. In the implementation of any (CG) method, the step-size is often determined by certain line search condition such as the Wolfe-Powell conditions

1

(

)

T

k k k k k k k

f x





d



f



 

g d

₍₁₁₎

2

(

)

T

k k k k k k

g x





d Td





g d

₍₁₂₎

where 0 <



1 <



2 < 1 . (Kinsella, 2006). These types of line search involve extensive computation of

function values and gradients which often becomes a significant burden for large-scale problem.

2.2. Exact Line Search:

To perform exact line search on a problem is to take the best possible step



_k for a given search direction

k



for a general problem it is not known how to, analytically, perform exact line search. However, a well known problem that exact line search can be applied on is the quadratic problem. To perform exact line search on this problem is to let the step-size.

(

)

T k k

k T

k

f x

P H









 

(13)

for a given point

x

k_{and search direction}



_k _{. Another problem that it is possible to perform exact line}

search on, is

2

1 ( )

(

)

2

T T T T

f x





x x



x Hx



C x

(14)

This problem is similar to the quadratic problem, with the only difference that a constant γ times a forth degree term in x is added. This term could for example module disturbance on a quadratic problem. To be able to perform an exact line search the



k that minimizes the step-size of

f x

(

_k



 

_k _k

)

has to be obtained. It

turns out that



k is obtained by solving:

3 2

1 2 3 4

2 1

2

2 3

4

0 where

4 (

)

12 (

)(

)

4 ((

)(

) 2(

) )

4 (

)(

) (

)

k k k

T k k

T T

k k k k

T T T T

k k k k k k k k

T T T T

k k k k k k k

x x

x

H

x x

x

x H

C

 



  



  





 



















(15)

Let



k be the real



k-value obtained by solving the equation above. Hence the function (14) is convex,

(3)

2.3. Backtracking Line Search Technique:

Backtracking line search is a strategy which starts from an initial step-size guess and successively decrease the size of



k until a function reduction is obtained. This strategy is presented in the algorithm below.

Algorithm (2.3.1): Backtracking line search technique k



__{1; iter}__0;

itermaxmaximal number of iterations; While iter < itermax then

Iter  iter +1

*

x



x

_k



 

_k _k If

f

 

x

*



f

 

x

_k then stop;

else

k



_



_k _/2 end end

For more details see (Sun & Yuan, 2006).

2.4. Soft Line Search:

Many researches in optimization have proved their inventiveness by producing new line search methods or modifications to know methods. What we present here are useful combinations of ideas of different origin. The description is based on (Madsen,1984). In the early days of optimization exact line search was dominant. Now, soft line search is used more and more, and we rarely see new methods which require exact line search. An advantage of soft line search over exact line search is that it is the faster of the two. If the first guess on the step length is a rough approximation to the minimizer in the given direction, the line search will terminate immediately if some mild criteria are satisfied. The result of exact line search is normally a good approximation to the result, and this can make descent methods with exact line search find the local minimizer if fewer iterations than what is used by a descent method with soft line search. However, the extra function evaluations spent in each line search often makes the descent method with exact line search a loser. If we are at the start of the iteration with a descent method, where x is far from the solution x*, it does not matter much that the result of the soft line search is only a rough approximation to the result, this is another point in favor of the soft line search.

Algorithm (2.4.1): Soft line search proceduire: begin

if φ (0) ≥ 0,



:= 0 else

k:=0;  :=β*φ(0); :=0; b:=min{1, max}

While (φ(b) ≤  (b)) and (φ(b) ≤



) and (b <  max) and (k < k max) k :=k+1; :=b

b:=min{2b,  max}



:=b

While (φ() ≤λ()) or (φ() < γ) and (k < k max) k :=k+1; and Refine  in [a, b];

if φ() ≥φ(0) then :=0 end

Now, In an interval [a, b] which contains acceptable points, and the output is an  found by interpolation. We want to be sure that the intervals have strictly decreasing widths, so we only accept the new  if it inside [a+d , b-d] where d=

(

)

10

1 a

b



. The  splits [a, b] into two subintervals, and we also return the subinterval which must contain acceptable points.

(4)

begin

D:=b-a; C:=(φ(b)- φ(a)-D* φ(a))/D2

if C>0



:= - φ(a)/(2C)



:=min{max{, a+0.1D}, b-0.1D} else



:= (a+ b)/2 If φ() < λ()

a:=



else

b:=



end

Finally, we give the following remarks about the implementation of the algorithm. The function and slope values are computed as φ() = f(x+



h),

φ()= hT_f'(x+



_h). ₍₁₆₎

The computation of f and f' is the "expensive" part of the line search. Therefore, the function and slope values should be stored in auxiliary variables for use in acceptance criteria and else where, and the implementation should return the value of the objective function and its gradient to the calling programmer, a descent method. They will be useful as starting function value and for the starting slope in the next line search (The next iteration). For more details see (Madasen, 1984).

2. 5. A New Closed-Form Step-Size Technique:

Let us define that the step-size sequence {



k} satisfies the Armijo condition with

 

(0,1)

if

1

(

)

(

)

t

0,

.

k k k k k

f x



f x

_

 



g d





k

₍₁₇₎

Assumption (2.5.1):

Let us assume that

f

:

R

n



R

is differentiable on N with Lipschitz constant





0

:

1

( )

,

f x



x

x x

N



 









₍₁₈₎

Assumption (2.5.2):

Let

x

kbe defined by (2) with





(0, 2)

and let Assumption 2.4.1. Then the Armijo condition (17) is

satisfied by step-size sequence with

   

I



/

C

Imax, where



is defined by



 

1 

/ 2 (0,1)



.

Moreover, we have:

min 1 max 1

0 

C



_k



C

_I



_k



k

,

₍₁₉₎

For more details see ( Labat & Idier, 2005).

3. A Modified-Family CG-Algorithm With A New Closed Form Line Search:

In this paper, we restrict ourselves to the following five-parameter family of CG algorithms due to (Al-Bayati and Metras, 2008) :

Algorithm 3.1: (The Modified CG-Algorithm):

Let

x

k1



x

k





k

d

k

1

k k k k

C

 

g





d

_

if

T

0

k k k k

k

C

g C

d

C

otherwise





 

_



, ,

0, for k=0

, for k 1

k k k

k

  





 





Here



k_{are calculated by the closed line search parameter defined in (19);}

k



N g

,

k

 

f x

(

k

)

and

(5)

, , ₂

1

(1

)

/

k k k

T

k k

g

k k

g y

k k

D

k

  







_













_

where

2

1 1 1 1 1 1

(1

)

T T

k k k k k k k k k k

D

 







g

_





d

_

y

_





_

d

_

g

_

1,1,0

k k







_{reduces to HS.}



1,0,1_k





_k_{reduces to LS. (Liu-Storey). Further more, it is easy to extend this} technique to four-parameter family (Al-Bayati and Ahmed, 2005) with the same values of



k_{defined in (19)}

as:-

2 2

2

(1

)

(1

)

T k

four

k _T _T

g

g y

y

g

d y

d g

 







 





 





 





[0,1]

[0,1

]

[0,1

]

k k k

k k



 















where Dk is defined by 2

1 1 1 1 1 1

(1

)

T T

k k k k k k k k k k

D

 







g

_





d

_

y

_





_

d

_

g

_ where



k



[0,1],



k



[0,1]and



k



[0,1





k

]

are parameters.

Finally, it is easy also to extend this technique to five-parameter family (Al-Bayati & Hassan, 2008) with the same values of



k defined in (19) as :

2 2

2

(1

)

(

)

(1

)

T T

five

k _T _T

g

y

v

g

g y

y

g

d y

d g

  













 

  

 







  

[0,1]

[0,1

]

[0,1

]

[0,1

]





























_{where D}

k is defined by 2

1 1 1 1 1 1

(1

)

T T

k k k k k k k k k k

D

 







g

_





d

_

y

_





_

d

_

g

_

3.2. The Global Convergence Property Of The Modified CG-Algorithm 3.1:

To study the convergence property let us consider under the conditions of assumption (2.4.2); we have :

Theorem (3.3):

If

lim inf

k

0

k

g



then

, ,

lim

k k_k k

0

k

  







(20)

Proof.

According to (2) and



k





1k

 



g d

Tk k

/

d Q d

kt k0 k (21)

Let

2 ₂ 2 _{max 2} _{1 2} 2

1

(

1

) (

)

k k k k k k

x

_



x





d



C



d

₍₂₂₎

But from (21)

1 T

_/

t 0

k k

g d

k k

d Q d

k k k







 



Hence

2 2

2 _max

1 1 0 2

, 0

(

)

(

)

(

)

k

t

k k k

k k t

k k d k k k

g d

d

x

C

d Q d













(23) And from

2 2

1 2

t i k

(6)

2 0 1

t k k k k

v d

d Q d





₍₂₅₎

0

1 2

t k k k

k

d Q d

v

d

 

(26) 0 2

1 _{2 2}

(

)

t k k k

k

d Q d

v

d





(27)

2

2 2 0 2

1

1/

(

) / (

t

)

k k k k

v

d

d Q d





₍₂₈₎

We have:

2

2 max 2

1 1 1 2

, 0

(

)

(

/ )

k t k k k k

k k d _k

g d

x

C

v

d



 







(29)

We conclude that

2 1

lim

_k _k

0

k

x





x



. Because f is continuously differentiable and

g

k is bounded

according to Assumption (2.4.2), and the bounded ness of L, we have also

lim

k 1

0

k

y





and

1

lim

T

0

k k k

g y





_.

3.4. The Sufficient Descent Condition For The Modified CG-Algorithm (3.1):

According to the above theoretical results we have to show that:

, , , ,

lim

k k k_k k k

0

k     







, , , , 2 1

(1

)

/

k k k k k

T

k k

g

k k

g y

k k

D

k

    







_













_

(30)

Where Dk is defined in the 5-parameter family, then

, , , , ₂

1

(1

)

k k k k k

t k k

k

k k k

k k

D

g y

g

    

_











(31)

This gives , , , , 2 2 1 1

(1

)

(1

)

k k k k k

t k k

k k k k k k

k k

g y

g

    



_

_



 







(32)

, , , , 2 2 1

(1

)

2

k k k k k

t k k

k k k k

g y

    







_









(33)

2 , , , ,

2

1

k k k k k

k k     



_

_









_

 

_





(34)

Now for the exact case, i.e. T_₁ _k_₁

_

₀

k

g

d

, the theorem is well-defined, but let us now consider the case

where 1 1

0

t k k

g

_

d

_



_{. Given the value of c}

k from algorithm (3.1) and (19) we have : , , , ,

1

(

k k k k k

)

t t

k k k k k k

g C

g

d

    



_







(35)

=

2

1 1 1

(

t

)(

t

) /

k k k k k k k

g

g y

_

g

_

g d

_

D





(36) According to (34) we reduce …

2 max min

1

(1

/ ) / (

1

/

2

)

t t

k k k k I k k

g C

  



g y

_



C

 







C

 

₍₃₇₎

The latter inequality yield

g C

kt k

 



2

/ 2

for all sufficiently large k, And this, will implies that:





, , , ,

2 min 2

1

(1

)

(

/

2

)

/ 2

k k k k k

t

k k k k k k k

g C

C

    





 





  

(7)

=

, , , ,

min 2

2

(1

/ 2 (1

/ )

)

k k k k k

k k

C

k

    







 

   

(39)

Finally (29) and (34) jointly imply

, , , ,

lim

k k k_k k k

0

k

    







(40)

Conclusions:

In this paper we have introduced the 3; 4 and 5-parameters family of GC-algorithms due to (Dia & Yuan,1998); (Al-Bayati and Ahmed, 2005) and (Al-Bayati and Metras, 2008) respectively by employing a new closed-form step-size technique defined in the modified Algorithm (3.1). This family yields all the existence conjugacy coefficients. We have proved that under the new values of



k , the 5-parameter family of

CG-algorithm has a global convergence property under the assumptions of the new proposed line-search technique and also it satisfies the new conjugacy condition.

REFERENCES

Al-Bayati, A.Y. and B.A. Metras, 2008. "A new five parameter CG-methods" 2nd_{conf. ICMS, Syria,}

238-256.

Al-Bayati, A.Y. and H.I. Ahmed, 2005. "A new Four-Parameters Family of nonlinear conjugate gradient methods" Iraqi Journal of statistical Sciences, 7: 18-38.

Dai, Y. and Y. Yuan, 1999. "A nonlinear conjugate gradient method with a strong global convergence property", SIAM journal of optimization, 10(1): 177-182.

Dai, Y. and Y. Yuan, 2001. "A three-parameter family of nonlinear conjugate gradient methods", Math. of computation, 70: 1155-1167.

Dixon, L.C., 1972. "Conjugate Gradient Algorithms Quadratic Termination without Linear searches" Numerical Optimization center the Hatfield polytechnic, Technical Report, 38.

Fletcher, R. and C. Reeves, 1964. "Function minimization by conjugate gradients" The computer Journal, 7(2): 149-154.

Fletcher, R., 1987. "practical methods of optimization", Wiley & Sons, New York, 2nd_edition.

Hestenes, M.R. and E. Stiefel, 1952. "Methods of conjugate gradients for solving linear systems", Journal of Research of the National Bureau of Standards, Sec. B 48: 409-436.

Kinsella, J., 2006. 'Course Notes for MS4327 optimization ".

Labat, C. and J. Idier 2005. "Convergence of CG-method with a closed-form step-size formula, special communications",

Liu, Y. and C. Storey, 1991. "Efficient generalized conjugate gradient algorithms, part 1:theory", Journal of optimization theory and applications, 69: 129-137.

Madsen, K., 1984."optimization uden bibetingelser (in Danish) Haefte, vol. 46, Numerisk Instit DIH. Polak, E. and G. Ribiere, 1969. "Note sure la convergence de methods des directions conjuees", Revue francaise d'Informatique et de Rrcherche operationelle, 16 : 35-43.