Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 29, Number 5, October 2024
Page(s) 403 - 411
DOI https://doi.org/10.1051/wujns/2024295403
Published online 20 November 2024

© Wuhan University 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

Unconstrained optimization methods are widely used in the fields of nonlinear

dynamic systems and engineering computation to obtain the numerical solution of the optimal control problem[1,2]. In this paper, we consider the following unconstrained optimization problem:

m i n   f   ( x ) ,   x R n , (1)

where f: RnR is sufficiently smooth high-dimensional function. The nonlinear conjugate gradient (CG) methods are highly useful for solving this kind of problem because they can avoid storing and remembering matrices[3,4]. In general, the iterative formula of the CG method is obtained by

x k + 1 = x k + α k d k , (2)

where αk is step size along the search direction dk, and dk defined by

d k = { - g k , k = 1 , - g k + β k d k - 1 , k 2 , (3)

where gk denotes the gradient f(xk). βk is scalar and it is chosen differently so that there are different CG methods corresponding to the different βk. Some well-known formulas for βk are given by

β k H S = g k T y k - 1 d k - 1 T y k - 1   (Hestenes-Stiefel (HS) method[5]),

β k F R = g k 2 g k - 1 2   (Fletcher-Reeves (FR) method[6]),

β k P R P = g k T y k - 1 g k - 1 2   (Polak-Ribiere-Polyak (PRP) method[7]),

β k C D = - g k 2 d k - 1 T g k - 1 (Conjugate Descent (CD) method[8]),

β k L S = g k T y k - 1 - d k - 1 T g k - 1   (Liu -Storey (LS) method[9]),

and

β k D Y = g k 2 d k - 1 T y k - 1   (Dai and Yuan (DY) method[10]),

where · is Euclidean norm and "T" stands for the transpose,yk-1=gk-gk-1. In general, DY and FR methods based on inexact line search have good convergence performance, but the computational efficiency is not as good as that of PRP method. In order to establish methods with both good numerical performance and convergence properties, many scholars have proposed improved conjugate gradient methods in recent years[11-14]. They showed that the method is globally convergent if the following strong Wolfe line search conditions for αk are satisfied,

{ f ( x k + α k d k ) - f ( x k ) ρ α k g k T d k | g ( x k + α k d k ) T d k | - σ g k T d k , (4)

where 0<ρ<σ<1.

The line search in the conjugate gradient method is usually chosen by a strong Wolfe line search, however, the line search skill usually brings computational burden, especially in solving large-scale nonlinear unconstrained optimization problem. To overcome this problem, Sun and Zhang[3] introduced five CG methods respectively without line search in which the line search step is replaced by a simple step size formula αk:

α k = - δ g k T d k d k Q k 2 , (5)

where dkQk=dkTQkdk, δ(0, vmin/μ) is chosen such that δμ/vmin<1, μ is Lipchitz constant defined in Assumption 1 below, and {Qk} is a sequence of positive definite matrices satisfying for positive constants vmin>0 and vmax>0 such that dRn, vmindTddTQkdvmaxdTd.

Chen and Sun[4] extended this technique to two-parameter family CG method, which can be deemed as generalization of the HS and LS without line search. Yu[15] and Narushima[16] applied this method to memory gradient method and obtained their global convergence, respectively. Yin and Chen[17] showed that three-term CG methods without line search are globally convergence. Other modified CG methods without line search reader may see Refs.[18-20].

As supplementary of these results, in this paper, we continue their research and would show that the proposed CG method without line search in which the line search step is replaced by fixed step-length formula (5), is global convergence for following new βk:

β k = g k 2 ω k g k - 1 2 + μ k d k - 1 T y k - 1 , (6)

where ωk0, μk0. This motivation mainly comes from Refs.[3, 4]. Our main aim is to develop a new method and obtain better property of the new method while keeping its simple structure. It should be noted that the formula (6) includes some important special sub-classes. For example, when ωk=0, μk=1, the proposed CG method can be deemed as DY method without line search; when ωk=1, μk=0, the proposed CG method can be deemed as FR method without line search. Hence, the proposed CG method (6) without line search can be deemed as generalization of the DY and FR.

The rest of this paper is organized as follows. In Section 1, we first give some preliminary results on the CG method without line search and discuss our method sufficient descent property. Furthermore, we prove the global convergence of the proposed method without line search and present algorithm frame. A large amount of numerical experiments are given for illustrative purposes in Section 2. Conclusions appear in Section 3.

1 Analysis of Global Convergence and Algorithm Frame

1.1 Analysis of Global Convergence

We adopt the following assumption on function of which is commonly used in the literature.

Assumption 1[3] The objective function f is LC1 in a neighborhood Ω of the level set L:={ xRn|f(x)f(x1)} and L is bounded. Here, by LC1 we mean that the gradient f(x) is Lipschitz continuous with modulus μ ,i.e., there exist μ>0 such that

f ( x k + 1 ) - f ( x k ) μ x k + 1 - x k  

for any xk+1, xkΩ.

Assumption 2[3] The function f is LC1 and strongly convex on Ω. In other words, there exists λ>0 such that

[ f ( x k + 1 ) - f ( x k ) ] T ( x k + 1 - x k ) λ x k + 1 - x k 2   f o r   a n y   x k + 1 ,   x k Ω .

Lemma 1   Suppose that xk is given by (2), (3) and (5). Then

g k + 1 T d k = ρ k g k T d k (7)

holds for all k, where

ρ k = 1 - δ ϕ k d k 2 d k 2 Q k   (8)

and

ϕ k = { 0 , f o r    α k = 0 , ( g k + 1 - g k ) T ( x k + 1 - x k ) x k + 1 - x k 2 , f o r    α k 0 .     (9)

Proof   See Lemma 1 in Ref. [3].

Lemma 2   Suppose that Assumption 1 holds and that xk is given by (2), (3) and (5). Then

l i m i n f k g k 0   i m p l i e s   d k 0 g k 4 d k 2 < . (10)

Proof   See Lemma 5 in Ref. [3].

Lemma 3   Suppose that Assumption 2 holds and βk is given by (6). Then gkTdk-gk2.

Proof   We prove Lemma 3 by induction. For k=1, it is easy to obtain that g1Td1=-g12-g12. Suppose gk-1Tdk-1-gk-12 holds, now we prove that gkTdk-gk2 holds.

By (3), (6) and Lemma 1, we have

        g k T d k = g k T ( - g k + β k d k - 1 ) = g k T ( - g k + g k 2 ω k g k - 1 2 + μ k d k - 1 T y k - 1 d k - 1 ) = - g k 2 + g k 2 ω k g k - 1 2 + μ k d k - 1 T y k - 1 g k T d k - 1 = - g k 2 + g k 2 ρ k - 1 g k - 1 T d k - 1 ω k g k - 1 2 + μ k d k - 1 T y k - 1 .

From Corollary 3 in Ref.[3], we know 0<ρk-11. From the induction hypothesis, we have ρk-1gk-1Tdk-1-ρk-1gk-120. Furthermore, one has

μ k d k - 1 T y k - 1 = μ k d k - 1 T ( g k - g k - 1 ) = μ k ( d k - 1 T g k - d k - 1 T g k - 1 ) = μ k ( ρ k - 1 g k - 1 T d k - 1 - g k - 1 T d k - 1 ) = μ k ( ρ k - 1 - 1 ) g k - 1 T d k - 1 0 ,

which is due to μk0, ρk-1-1<0 and gk-1Tdk-10. From above analysis, we have

g k 2 ρ k - 1 g k - 1 T d k - 1 ω k g k - 1 2 + μ k d k - 1 T y k - 1 0 ,

where ωk and μk are not equal to zero at the same time. Thus, we obtain gkTdk=-gk2+gk2ρk-1gk-1Tdk-1ωkgk-12+μkdk-1Tyk-1-gk2. This finishes our proof.

If there exists a constant c>0 such that gkTdk-cgk2(k1), then we say the search direction dk of the method satisfies the sufficient descent condition. Lemma 3 means that the search direction dk is the sufficient descent direction, which is often required in the convergence analysis of CG method.

Theorem 1   Suppose that Assumptions 1, 2 hold and that βk is given by (6). Then

d k 2 g k 4 d k - 1 2 ω k g k - 1 4 + H g k 2 ,

where H is a positive constant.

Proof   By (3), we have

d k = - g k + β k d k - 1 ,

squaring both sides of it, we obtain

d k 2 = - g k + β k d k - 1 2

By Lemma 1 and substituting the expression (6) into above formula, we have

d k 2 = - g k + g k 2 ω k g k - 1 2 + μ k d k - 1 T y k - 1 d k - 1 2 = g k 2 - 2 g k 2 ω k g k - 1 2 + μ k d k - 1 T y k - 1 g k T d k - 1 + g k 4 d k - 1 2 [ ω k g k - 1 2 + μ k d k - 1 T y k - 1 ] 2

= g k 2 - 2 ρ k - 1 g k - 1 T d k - 1 ω k g k - 1 2 + μ k d k - 1 T y k - 1 g k 2 + g k 4 d k - 1 2 [ ω k g k - 1 2 + μ k d k - 1 T y k - 1 ] 2

g k 2 | 1 + 2 ρ k - 1 g k - 1 T d k - 1 ω k g k - 1 2 + μ k d k - 1 T y k - 1 |

  + g k 4 d k - 1 2 [ ω k g k - 1 2 + μ k d k - 1 T y k - 1 ] 2  

g k 2 | 1 + 2 ρ k - 1 g k - 1 T d k - 1 ω k g k - 1 2 | + g k 4 d k - 1 2 [ ω k g k - 1 2 ] 2 ,

which is due to μkdk-1Tyk-1>0 that is deduced in Lemma 3. By gk-1Tdk-1-gk-12 in Lemma 3 , we have

2 ρ k - 1 g k - 1 T d k - 1 ω k g k - 1 2 - 2 ρ k - 1 g k - 1 2 ω k g k - 1 2 = - 2 ρ k - 1 ω k

Now by 1-δμ/vminρk-1 that is deduced in Ref. [3], and the two inequalities above, we obtain

d k 2 | 1 - 2 ρ k - 1 ω k | g k 2 + g k 4 d k - 1 2 [ ω k g k - 1 2 ] 2 | 1 - 2 ( 1 - δ μ / v m i n ) ω k | g k 2 + g k 4 d k - 1 2 ω k 2 g k - 1 4 .

Let H=|1-2(1-δμ/vmin)ωk|, we obtain

d k 2 H g k 2 + g k 4 d k - 1 2 ω k g k - 1 4 , d k 2 g k 4 d k - 1 2 ω k g k - 1 4 + H g k 2 .

The proof is completed. There holds 1-δμ/vminρk-1 for all k if Assumption 1 is valid.

Theorem 2   Suppose that Assumptions 1, 2 hold and that sequence {xk} is given by (2), (3), (5) and (6). Then liminfkgk=0.

Proof   Suppose, by contradiction, that the stated conclusion is not true. Then, in view of gk>0, there exists a constant ε>0, such that gkε for all k, and by Theorem 1 we have dk2gk4dk-12ωkgk-14+Hε2.

Thus, we have a recursive equation which leads to

d k 2 g k 4 d k - 1 2 ω k g k - 1 4 + H ε 2 d k - 2 2 ω k g k - 2 4 + 2 H ε 2 d 1 2 ω 2 g 1 4 + ( k - 1 ) H ε 2 . (11)

Thus, equation (11) means that

g k 4 d k 2 1 a b + ( k - 1 ) c (12)

where a=1ω2, b=d12g14 and c=Hε2, a, b and c are constants. From equation (12) we have dk0gk2dk2=+, which is contradictory to Lemma 2. Hence we obtain liminfkgk=0.

Theorem 2   denotes that our proposed CG method without line search possesses global convergence.

1.2 Algorithm Frame

Based on the discussion above, now we can describe the algorithm frame for solving the unconstrained optimization problem (1) as follows:

Step 0   Given an initial point x0Rn, constants ε0>0, ρ(0,1), σ(0,1), ωk0, μk0 and let k:=0.

Step 1   If a stopping criterion gk<ε0 is satisfied, then stop; otherwise, go to Step 2.

Step 2   Compute a step size αk by formula (6).

Step 3   Let xk+1=xk+αkdk, compute dk and βk by (3) and (6).

Step 4   Let k:=k+1 and go to Step 1.

From above algorithm, we note that the step-length αk depends on the Lipschitz constant, but in general, the Lipschitz constant is not known in previously. It should be noted that if f(x) is a twice continuous differentiable strictly convex function, the Lipschitz constant can be replaced by a positive constant[15]. So, we choose δ=1.

2 Numerical Experiments

In this section, in order to show the performance of the given method, we test our proposed method with fixed step-length (5) (denoted by OMFSL), method (6) with the strong Wolfe line search (4) (denoted by SSWLS), and some other considered algorithms without line search such as DY and FR method [3] and Chen and Sun's method[4] via 24 test problems from Ref.[21]. We use the same convergence criteria in Ref.[21].

The parameters are chosen as ρ=0.04, σ=0.5, ωk=0.7 and μk=0.3. All codes are written in MATLAB 7.5 and ran on Lenovo with 1.90 GHz CPU processor, 2.43 GB RAM memory, and Windows XP operating system. In our numerical experiments the no-line-search method with Qk=I (the unit matrix) show bad convergence behavior. Thus we consider BFGS updates for Qk in formula (5).

The test results are shown in Tables 1 and 2. The results for each problem by each method are shown in the form of k/NG/NF/f(x¯)/Tcpu, where k, NG, NF, f(x¯), Tcpu denote the number of the iteration at the terminate iteration, the total number of calculations of the gradient value of the objective function, the total number of computations of the objective function value, the value of function at the terminate iteration, the CPU time in seconds, respectively. n indicates the variable dimension of the test problems.

From Tables 1 and 2, it is easy to see that OMFSL performs better than other considering methods. Meanwhile, we presented the Dolan and Moré[22] performance profiles for OMFSL and other considered methods. Note that the performance ratio p(τ) is the probability for a solver for the tested problems with the factor τ of the smallest cost. In Figs.1-5, the performance of OMFSL with fixed step-length (5) and other considered methods are presented.

thumbnail Fig. 1 Performance profile on the absolute errors of f(x¯) versus f(x*)

thumbnail Fig. 2 Performance profile on the number of iteration

thumbnail Fig. 3 Performance profile on the NG

thumbnail Fig. 4 Performance profile on the NF

thumbnail Fig. 5 Performance profile on the Tcpu

As we can see from Fig.1, OMFSL is superior to all other considering methods for the absolute errors of f(x¯) versus f(x*) (the function value at the optimal solution). Figures 2-5 shows that OMFSL is better than all other considered methods for the number of iteration, the number of gradient evaluations (NG), the number of function value evaluations (NF) and CPU time (Tcpu). In conclusion, we can see that OMFSL without line search is very competitive for the test problems, and OMFSL is alternative for solving nonlinear unconstrained optimization problems.

Table 1

Numerical results for the tested problems

Table 2

Results of f(x¯) and Tcpu for the tested problems

3 Conclusion

In this paper, we have combined a two-parameter conjugate gradient method defined by formula (6) with Sun-Zhang's step-length formula defined by formula (5) and have proved its global convergence property under the appropriate assumptions. This step-length formula (5) might be practical in case that the line search is expensive or hard. We allow certain flexibility in selecting the sequence {Qk} in practical computation, and we also can consider other updates for Qk. Our proofs require that the function is at least and the level set is bounded. Reducing the requirement of optimization function and finding completely constant step-length might be topic of further research.

References

  1. Zhang L M, Gao H T, Chen Z Q, et al. Multi-objective global optimal parafoil homing trajectory optimization via Gauss pseudo spectral method[J]. Nonlinear Dynamics, 2013, 72(1): 1-8. [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
  2. Jiang X Z, Jian J B. A sufficient descent Dai-Yuan type nonlinear conjugate gradient method for unconstrained optimization problems[J]. Nonlinear Dynamics, 2013, 72(1): 101-112. [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
  3. Sun J, Zhang J P. Global convergence of conjugate gradient methods without line search[J]. Annals of Operations Research, 2001, 103(1): 161-173. [CrossRef] [MathSciNet] [Google Scholar]
  4. Chen X D, Sun J. Global convergence of a two-parameter family of conjugate gradient methods without line search[J]. Journal of Computational and Applied Mathematics, 2002, 146(1): 37-45. [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
  5. Hestenes M R, Stiefel E. Methods of conjugate gradients for solving linear systems[J]. Journal of Research of the National Bureau of Standards, 1952, 49(6): 409-436. [CrossRef] [Google Scholar]
  6. Fletcher R, Reeves C M. Function minimization by conjugate gradients[J]. The Computer Journal, 1964, 7(2): 149-154. [CrossRef] [MathSciNet] [Google Scholar]
  7. Polyak B T. The conjugate gradient method in extremal problems[J]. USSR Computational Mathematics & Mathematical Physics, 1969, 9(4): 94-112. [CrossRef] [Google Scholar]
  8. Flecher R. Practical Methods of Optimization, Vol1: Unconstrained Optimization[M]. New York: John Wiley & Sons, 1987. [Google Scholar]
  9. Liu Y, Storey C. Efficient generalized conjugate gradient algorithms[J]. Journal of Optimization Theory and Application, 1991, 69(1): 129-137. [CrossRef] [MathSciNet] [Google Scholar]
  10. Dai Y H, Yuan Y X. A nonlinear conjugate gradient with a strong global convergence property[J]. SIAM Journal on Optimization, 1999, 10(1): 177-182. [CrossRef] [MathSciNet] [Google Scholar]
  11. Deepho J, Abubakar A B, Malik M, et al. Solving unconstrained optimization problems via hybrid CD-DY conjugate gradient methods with applications[J]. Journal of Computational and Applied Mathematics, 2022, 405: 113823. [CrossRef] [MathSciNet] [Google Scholar]
  12. Abubakar A B, Kumam P, Malik M, et al. A hybrid conjugate gradient based approach for solving unconstrained optimization and motion control problems[J]. Mathematics and Computers in Simulation, 2022, 201: 640-657. [CrossRef] [MathSciNet] [Google Scholar]
  13. Goncalves M L N, Lima F S, Prudente L F. A study of Liu-Storey conjugate gradient methods for vector optimization[J]. Applied Mathematics and Computation, 2022, 425: 127099. [CrossRef] [Google Scholar]
  14. Chen C L, Luo L L, Han C H, et al. Global convergence of an extended descent algorithm without line search for unconstrained optimization[J]. Journal of Applied Mathematics and Physics, 2018, 6(1): 130-137. [CrossRef] [Google Scholar]
  15. Yu Z S. Global convergence of a memory gradient method without line search[J]. Journal of Applied Mathematics & Computing, 2008, 26(1): 545-553. [CrossRef] [MathSciNet] [Google Scholar]
  16. Narushima Y S. A memory gradient method without line search for unconstrained optimization[J]. SUT Journal of Mathematics, 2006, 42(2): 191-206. [CrossRef] [MathSciNet] [Google Scholar]
  17. Yin L, Chen X D. Global convergence of two kinds of three-term conjugate gradient methods without line search[J]. Asia Pacific Journal of Operational Research, 2013, 30(1):1-10. [Google Scholar]
  18. Li X, Chen X D. Global convergence of shortest-residual family of conjugate gradient methods without line search[J]. Asia Pacific Journal of Operational Research, 2005, 22(4): 529-538. [CrossRef] [MathSciNet] [Google Scholar]
  19. Du S Q, Chen Y Y. Global convergence of a modified spectral FR conjugate gradient method[J]. Applied Mathematics and Computation, 2008, 202(2): 766-770. [CrossRef] [MathSciNet] [Google Scholar]
  20. Zhu H,Chen X D. Global convergence of a special case of the Dai-Yuan family without line search[J]. Asia-Pacific Journal of Operational Research, 2008, 25(3): 411-420. [CrossRef] [MathSciNet] [Google Scholar]
  21. Zhu T F, Yan Z Z, Peng X Y. A modified nonlinear conjugate gradient method for engineering computation[J]. Mathematical Problems in Engineering, 2017, 2017(1): 1425857. [CrossRef] [PubMed] [Google Scholar]
  22. Dolan E D, Moré J J. Benchmarking optimization software with performance profiles[J]. Mathematical Programming, 2002, 91(2): 201-213. [CrossRef] [MathSciNet] [Google Scholar]

All Tables

Table 1

Numerical results for the tested problems

Table 2

Results of f(x¯) and Tcpu for the tested problems

All Figures

thumbnail Fig. 1 Performance profile on the absolute errors of f(x¯) versus f(x*)
In the text
thumbnail Fig. 2 Performance profile on the number of iteration
In the text
thumbnail Fig. 3 Performance profile on the NG
In the text
thumbnail Fig. 4 Performance profile on the NF
In the text
thumbnail Fig. 5 Performance profile on the Tcpu
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.