Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Ahookhosh, Masoud; Ghaderi, Susan

doi:10.1007/s12190-015-0883-9

Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Original Research
Published: 26 March 2015

Volume 50, pages 529–555, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Download PDF

Masoud Ahookhosh¹ &
Susan Ghaderi²

480 Accesses
9 Citations
Explore all metrics

Abstract

This paper addresses some trust-region methods equipped with nonmonotone strategies for solving nonlinear unconstrained optimization problems. More specifically, the importance of using nonmonotone techniques in nonlinear optimization is motivated, then two new nonmonotone terms are proposed, and their combinations into the traditional trust-region framework are studied. The global convergence to first- and second-order stationary points and local superlinear and quadratic convergence rates for both algorithms are established. Numerical experiments on the CUTEst test collection of unconstrained problems and some highly nonlinear test functions are reported, where a comparison among state-of-the-art nonmonotone trust-region methods shows the efficiency of the proposed nonmonotone schemes.

An accelerated nonmonotone trust region method with adaptive trust region for unconstrained optimization

Article 01 September 2017

A New Restarting Adaptive Trust-Region Method for Unconstrained Optimization

Article 11 February 2017

A new simple model trust-region method with generalized Barzilai-Borwein parameter for large-scale optimization

Article 29 July 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper we consider the unconstrained minimization problem

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} \quad f(x)\\ \text {subject to} &{} \quad x\in {\mathbb {R}}^n, \end{array} \end{aligned}$$

(1)

where $f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}$ is a real-valued nonlinear function, which is bounded and continuously-differentiable. We suppose that first- or second-order black-box oracle of $f$ is available.

1.1 Motivation and history

Trust-region methods, also called restricted step methods [22], are a class of iterative schemes developed to solve convex or nonconvex optimization problems, see, for example, [14]. They also developed for nonsmooth problems, see [14, 17, 24, 46]. Trust-region methods have strong convergence properties, are reliable and robust in computation, and can handle ill-conditioned problems, cf. [35, 36]. Let $x_k$ be the current iteration. In trust-region framework the objective $f$ is approximated by a simple model in a specific region around $x_k$ such that it is an acceptable approximation of the original objective, which is called trust-region (region of trust). Afterward, the model is minimized subject to the trust-region constraint to find a new trial point $d_k$. Hence the simple model means that it can be minimized much easier than the original objective function. If the founded model is an adequate approximation of the objective function within the trust-region, then the point $x_{k+1}=x_k+d_k$ is accepted by the trust-region method and the region can be expanded for the next iteration; conversely, if the approximation is poor, the region is contracted and the model is minimized within the contracted region. This scheme will be continued until finding an acceptable trial step $d_k$ guaranteeing an acceptable agreement between the model and the objective function.

Several quadratic and non-quadratic models have been proposed to approximate the objective function in optimization, see [15, 23, 37, 40], however, the conic and quadratic models are becoming more popular, see [18, 37, 38]. If the approximated model is quadratic, i.e.,

$$\begin{aligned} q_k(d) := f_k + g_k^T d + \frac{1}{2} d^T B_k d, \end{aligned}$$

(2)

where $f_k = f(x_k)$, $g_k = \nabla f(x_k)$, and $B_k \approx H(x_k)=\nabla ^2 f(x_k)$, the trust-region method can be considered as a globally convergent generalization on classical Newton’s method. Then the trust-region subproblem is defined by

$$\begin{aligned} \begin{array}{ll} \mathrm {minimize} &{} \quad q_k(d),\\ \mathrm {subject~to} &{} \quad \Vert d\Vert \le \delta _k. \end{array} \end{aligned}$$

(3)

Hence the trust-region is commonly a norm ball $C$ defined by

$$\begin{aligned} C := \big \{d\in {\mathbb {R}}^n \mid \Vert d\Vert \le \delta _k \big \}, \end{aligned}$$

where $\delta _k > 0$ is a real number called trust-region radius, and $\Vert \cdot \Vert $ is any norm in ${\mathbb {R}}^n$, cf. [30]. Since $C$ is compact and the model is continuous, the trust-region subproblem attains its minimizer on the set $C$. The most computational cost of trust-region methods relates to minimizing the model over the trust-region $C$. Hence finding efficient schemes for solving (3) has received much attention during past few decades, see [20, 21, 26, 32, 39]. Once the step $d$ is computed, the quality of the model in the trust-region is evaluated by a ratio of the actual reduction of objective, $f_k-f(x_k+d)$, to the predicted reduction of model, $q_k(0)-q_k(d)$, i.e.,

$$\begin{aligned} r_k = \frac{f_k-f(x_k+d)}{q_k(0)-q_k(d)}. \end{aligned}$$

(4)

For a prescribed positive constant $\mu _1 \in (0, 1]$, if $r_k \ge \mu _1$, the model provides a reasonable approximation, the step is accepted, i.e., $x_{k+1}=x_k+d_k$, and the trust-region $C$ can be expanded for the next step. Otherwise, the trust-region $C$ should be contracted by decreasing the radius $\delta _k$ and the subproblem (3) is solved in the reduced region. This scheme is continued until that the step $d$ accepted by trust-region test $r_k \ge \mu _1$. Our discussion can be summarized in the following algorithm:

In Algorithm 1, it follows from $r_k \ge \mu _1$ and $q_k(0) - q_k(d_k) > 0$ that

$$\begin{aligned} f_k - f_{k+1} \ge \mu _1 \big (q_k(0) - q_k(d_k)\big ) > 0, \end{aligned}$$

implying $f_{k+1} \le f_k$. This means that the sequence of function values $\{f_k\}$ is monotonically decreasing, i.e., the traditional trust-region method is also called the monotone trust-region method. This feature seems natural for minimization schemes, however, it slows down the convergence of TTR to a minimizer if the objective involves a curved narrow valley, see [1, 28]. To observe the effect of nonmonotonicity on TTR, we study the next example.

Example 1

Consider the two-dimensional Nesterov–Chebyshev–Rosenbrock function, cf. [29],

$$\begin{aligned} f(x_1,x_2) =\frac{1}{4}( x _1 - 1 )^ 2 +( x _2 - 2x_1^2 + 1 )^2 , \end{aligned}$$

where we solve the problem (1) by Newton’s method and TTR with the initial point $x_0 = (-0.61, -1)$. It is clear that $(1, 1)$ is the optimizer. The implementation indicates that Newton’s method needs 7 iterations and 8 function evaluations, while monotone trust-region method needs 22 iterations and 24 function evaluations. We depict the contour plot of the objective and iteration points as well as a diagram for function values versus iteration attained by these two algorithms in Fig. 1. Figure 1a shows that the iterations of TTR follow the bottom of the valley in contrast to those for Newton’s method that can go up and down to reach the $\varepsilon $-solution with the accuracy parameter $\varepsilon = 10^{-5}$. We see that Newton’s method attains larger step compared with those of TTR. Figure 1b illustrates function values versus iterations for both algorithms showing that the related function values of TTR decreases monotonically, while it is fluctuated nonmonotonically for Newton’s method.

In general the monotonicity may result to the slow iterative schemes for highly nonlinear or badly-scaled problems. To avoid this algorithmic limitation, the idea of nonmonotone strategies has been proposed traced back to the watch-dog technique to overcome the Maratos effect for constrained optimization [13]. To improve the performance of Armijo’s line search, Grippo et al. in 1986 [28] proposed the modified Armijo’s rule

$$\begin{aligned} f(x_k + \alpha _k d_k) \le f_{l(k)} + \sigma \alpha _k g_k^T d_k,~~ k= 0,1,2,\ldots , \end{aligned}$$

with the step-size $\alpha _{k} >0$, $\sigma \in (0, 1/2)$, and

$$\begin{aligned} f_{l(k)} = \max _{0 \le j \le m(k)} \big \{f_{k-j}\big \}, \end{aligned}$$

(5)

where $m(0)=0$, $m(k)\le \min \{m(k-1)+1, N\} $ for nonnegative integer $N$. It was shown that the associated scheme is globally convergent, and numerical results reported in Grippo et al. [28] and Toint [41] showed the effectiveness of the proposed idea. Motivated by these results, the nonmonotone strategies has received much attention during past few decades. For example, in 2004, Zhang and Hager in [47] proposed the nonmonotone term

$$\begin{aligned} C_{k}= & {} \left\{ \begin{array}{ll} f_0 &{} \quad \mathrm {if}~ k=0, \\ (\eta _{k-1}Q_{k-1}C_{k-1}+f(x_k))/Q_k&{} \quad \mathrm {if}~ k \ge 1, \end{array} \right. \nonumber \\ Q_{k}= & {} \left\{ \begin{array}{ll} 1 &{} \quad \mathrm {if}~ k=0, \\ \eta _{k-1}Q_{k-1}+1 &{} \quad \mathrm {if}~ k \ge 1, \end{array} \right. \end{aligned}$$

(6)

where $0 \le \eta _{min} \le \eta _{k-1} \le \eta _{max} \le 1$. Recently, Mo et al. in [31] and Ahookhosh et al. in [3] studied the nonmonotone term

$$\begin{aligned} D_k = \left\{ \begin{array}{ll} f_k &{} \quad \mathrm {if}~ k = 1,\\ \eta _{k}D_{k-1} +(1-\eta _{k})f_k &{} \quad \mathrm {if}~ k \ge 2, \end{array} \right. \end{aligned}$$

(7)

where $\eta _k \in [ \eta _{min} ,\eta _{max}] $, $ \eta _{min} \in [0,1] $, $ \eta _{max} \in [\eta _{min},1] $. More recently, Amini et al. in [8] proposed the nonmonotone term

$$\begin{aligned} R_k=\eta _k f_{l(k)}+(1-\eta _k)f_k, \end{aligned}$$

(8)

where $0\le \eta _{min}\le \eta _{max}\le 1$ and $\eta _{k}\in [\eta _{min},\eta _{max}]$. In all cases it was proved that the schemes are globally convergent and enjoy the better performance compared with monotone ones.

At the same importance of using nonmonotone strategies for inexact line search techniques, the combination of trust-region methods with nonmonotone strategies is interesting. Historically, the first nonmonotone trust-region method was proposed in 1993 by Deng et al. in [16] for unconstrained optimization. Under some classical assumptions, the global convergence and the local superlinear convergence rate were established. Nonmonotone trust-region methods were also studied by several authors such as Toint [42], Xiao and Zho [44], Xiao and Chu [45], Zhou and Xiao [48], Ahookhosh and Amini [2], Amini and Ahookhosh [7], and Mo et al. [31]. Recently, Ahookhosh and Amini in [1] and Ahookhosh et al. in [4] proposed two nonmonotone trust-region methods using the nonmonotone term (8). Theoretical results were reported, and numerical results showed the efficiency of the proposed nonmonotone methods.

1.2 Content

In this paper we propose a trust-region method equipped with two novel nonmonotone terms. More precisely, we first establish two nonmonotone terms and then combine them with Algorithm 1 to construct two nonmonotone trust-region algorithms. If $k \ge N$, the new nonmonotone terms are defined by a convex combination of the last $N$ successful function values, and if $k < N$, either a convex combination of $k$ successful function values or $f_{l(k)}$ is used. The global convergence to first- and second-order stationary points is established on some classical assumptions. Moreover, local superlinear and quadratic convergence rates for the proposed methods are studied. Numerical results regarding experiments on some highly nonlinear problems and on 112 unconstrained test problems from the CUTEst test collection [25] are reported indicating the efficiency of the proposed nonmonotone terms.

The remainder of paper is organized as follow. In Sect. 2 we propose new nonmonotone terms and their combination with the trust-region framework. The global convergence of the proposed methods are given in Sect. 3. Numerical results are reported in Sect. 4. Finally, some conclusions are given in Sect. 5.

2 Novel nonmonotone terms and algorithm

In this section we first present two novel nonmonotone terms and then combine them into trust-region framework to introduce two nonmonotone trust-region algorithms for solving the unconstrained optimization problem (1).

We first assume that $k$ denotes the current iteration and $N \in \mathbb {N}$ is a constant. The main idea is to construct a nonmonotone term determined by a convex combination of the last $k$ successful function values if $k < N$ and by a convex combination of the last $N$ successful function values if $k \ge N$. In the other words, we construct new terms using function values collected in the set

$$\begin{aligned} \mathcal {F}_k := \left\{ \begin{array}{ll} \{f_0,f_1, \ldots , f_k\} &{} \quad \mathrm {if}~ k < N,\\ \{f_{k-N+1},f_{k-N+2}, \ldots , f_k\} &{} \quad \mathrm {if}~ k \ge N, \end{array} \right. \end{aligned}$$

(9)

which should be updated in each iteration. To this end, motivated by the term (7), we construct $\overline{T}_k$ using the subsequent procedure

$$\begin{aligned} \left\{ \begin{array}{lll} \overline{T}_0&{} = f_0 &{} \quad \mathrm {if}~k = 0,\\ \overline{T}_1&{} = \big (1-\eta _{0}\big )f_1 + \eta _{0} f_0 &{} \quad \mathrm {if}~k = 1,\\ \overline{T}_2&{} = \big (1-\eta _{1}\big )f_2 + \eta _{1}\big (1-\eta _{0}\big ) f_1 + \eta _{1}\eta _{0} f_0 &{} \quad \mathrm {if}~k = 2,\\ \vdots &{} \vdots &{} \quad \vdots \\ \overline{T}_{N-1} &{} = \big (1-\eta _{N - 2}\big )f_{N - 1} + \eta _{N - 2}\big (1-\eta _{N - 3}\big )&{}\\ &{}\quad f_{N-2} + \cdots + \eta _{N - 2} \cdots \eta _{0} f_0&{} \quad \mathrm {if}~k = N - 1,\\ \overline{T}_N&{} = \big (1-\eta _{N - 1}\big ) f_N + \eta _{N - 1}\big (1-\eta _{N - 2}\big )&{}\\ &{}\quad f_{N-1} + \cdots + \eta _{N - 1} \cdots \eta _{0} f_0 &{} \quad \mathrm {if}~k = N, \end{array} \right. \end{aligned}$$

where $\eta _i \in [0,1)$, for $i = 1 , 2 , \ldots , N$, are some weight parameters. Hence the new term is generated by

$$\begin{aligned} \overline{T}_k : = \left\{ \begin{array}{ll} \big (1-\eta _{k-1}\big ) f_k + \eta _{k-1} \overline{T}_{k-1} &{} \quad \mathrm {if}~ k < N,\\ \big (1-\eta _{k-1}\big )~f_k + \eta _{k-1}\big (1-\eta _{k-2}\big ) &{}\\ \quad f_{k-1} + \cdots + \eta _{k-1} \cdots \eta _{k-N} ~ f_{k-N} &{} \quad \mathrm {if}~ k \ge N, \end{array} \right. \end{aligned}$$

(10)

where $ \overline{T}_0 = f_0$ and $\eta _i \in [0,1)$ for $i = 1, 2 , \ldots , k$. To show that $\overline{T}_k$ is a convex combination of the collected function values $\mathcal {F}_k$, it is enough to show that the summation of multipliers are equal to unity. For $k \ge N$, the definition for $\overline{T}_k$ implies

$$\begin{aligned}&\big (1-\eta _{k - 1}\big ) + \eta _{k - 1}\big (1-\eta _{k - 2}\big ) + \cdots + \eta _{k - 1} \cdots \eta _{k - N - 1} \big (1-\eta _{k - N}\big ) \nonumber \\&\quad +\,\, \eta _{k - 1} \cdots \eta _{k - N} = 1 \end{aligned}$$

(11)

For $k < N$, a similar summation of the last $k$ multipliers is equal to one. Therefore, the generated term $\overline{T}_k$ is a convex combination of the elements of $\mathcal {F}_k$.

The procedure of defining $\overline{T}_k$ clearly implies that the set $\mathcal {F}_k$ should be updated and saved in each iteration. Moreover, $N(N + 1)/2$ multiplications is required to compute $ \overline{T}_k$. To avoid saving $\mathcal {F}_k$ and decrease the number of multiplications, we derive a recursive formula for (10). From the definition of $ \overline{T}_k$, for $k \ge N$, it follows that

$$\begin{aligned} \overline{T}_k - \eta _{k - 1} \overline{T}_{k - 1}&= \big (1-\eta _{k - 1}\big )~f_k + \eta _{k - 1}\big (1-\eta _{k - 2}\big ) ~ f_{k - 1} + \cdots + \eta _{k - 1} \cdots \eta _{k - N} ~ f_{k - N}\\&\quad - \eta _{k - 1} \big (1-\eta _{k - 2}\big )~f_{k - 1} - \cdots - \eta _{k - 1} \cdots \big (1-\eta _{k - N -1}\big )~ f_{k - N}\\&\quad - \eta _{k - 1} \eta _{k - 2} \cdots \eta _{k - N - 1} ~ f_{k - N - 1}\\&= \big (1-\eta _{k - 1}\big )~f_k + \eta _{k - 1} \eta _{k - 2} \cdots \eta _{k - N - 1} ~ \big (f_{k - N} - f_{k - N - 1}\big )\\&= \big (1-\eta _{k - 1}\big )~f_k + \xi _k ~ \big (f_{k - N} - f_{k - N - 1}\big ) \end{aligned}$$

where $\xi _k : = \eta _{k - 1} \eta _{k - 2} \cdots \eta _{k-N-1}$. For $k \ge N$, this equation leads to

$$\begin{aligned} \overline{T}_k = \big (1-\eta _{k - 1}\big )~f_k + \eta _{k - 1} \overline{T}_{k - 1} +\xi _k~ \big (f_{k - N} - f_{k - N -1}\big ), \end{aligned}$$

(12)

which requires to save only $f_{k-N}$ and $f_{k-N-1}$ and only needs three multiplications. Moreover, the definition of $\xi _k$ implies

$$\begin{aligned} \xi _k = \eta _{k - 1} \eta _{k - 2} \cdots \eta _{k - N - 1} = \frac{\eta _{k - 1}}{\eta _{k - N - 2}} \eta _{k - 2} \eta _{k - 3} \cdots \eta _{k - N - 2} = \frac{\eta _{k - 1}}{\eta _{k - N - 2}} \xi _{k-1} . \end{aligned}$$

(13)

If $\xi _k$ is recursively updated by (13), (10), and (12), a new nonmonotone term is defined by

$$\begin{aligned} T_k : = \left\{ \begin{array}{ll} \!\!f_k + \eta _{k - 1} \big ( \overline{T}_k - f_k \big ) &{}\quad \mathrm {if}~ k < N,\\ \!\!\max \left\{ \overline{T}_{k} , f_k \right\} &{}\quad \mathrm {if}~ k \ge N, \end{array} \right. \end{aligned}$$

(14)

where the max term is added to guarantee $T_k \ge f_k$ .

As discussed in Sect. 1, nonmonotone schemes perform better when they use stronger nonmonotone terms far away from the optimizer and weaker one close to it. This motivate us to consider a new version of the derived nonmonotone term by using $f_{l(k)}$ in cases that $k < N$. More precisely, the second nonmonotone term is defined by

$$\begin{aligned} \!T_k = \left\{ \begin{array}{ll} \!f_{l(k)} &{}\quad \mathrm {if}~ k < N,\\ \max \left\{ \overline{T}_{k}, f_k \right\} &{}\quad \mathrm {if}~ k \ge N, \end{array} \right. \end{aligned}$$

(15)

where $\xi _k$ is defined by (13). It is clear that the new term uses a stronger term $f_{l(k)}$ defined by (5) for first $k < N$ iterations and then employs the relaxed convex term proposed above.

Now, to employ the proposed nonmonotone terms in the trust-region framework, it is enough to replace the ratio $r_k$ (4) by the nonmonotone ratio

$$\begin{aligned} \widehat{r}_k = \frac{T_k-f(x_k+d)}{q_k(0)-q_k(d)}, \end{aligned}$$

(16)

where $T_k$ is defined by (14) or (15). Hence in trust-region framework we replace (4) by (16). Notice that if $\widehat{r}_k \ge \mu _1$, then,

$$\begin{aligned} T_k-f_{k+1} \ge \mu _1\big (q_k(0)-q_k(d_k)\big ) \ge 0. \end{aligned}$$

This implies that $f_{k+1}$ can be larger than $f_k$, however, the elements of $\{f_k\}$ cannot arbitrarily increase, and the maximum increase is controlled by the nonmonotone term $T_k$. Moreover, the definitions (14) and (15) imply that $\widehat{r}_k \ge r_k$ increasing the possibility of attaining larger steps for nonmonotone schemes compared with monotone ones.

The above-mentioned discussion leads to the following nonmonotone trust-region algorithm:

In Algorithm 2, if $\widehat{r}_k \ge \mu _1$ (Line 7), it is called a successful iteration and if $\widehat{r}_k \ge \mu _2$ (Line 14), it is called a very successful iteration. In addition, in the algorithm, the loop started from Line 3 to Line 20 is called the outer cycle, and the loop started from Line 7 to Line 12 is called the inner cycle.

3 Convergence analysis

This section concerns with the global convergence to first- and second-order stationary points of the sequence $\{x_k\}$ generated by Algorithm 2. More precisely, we intend to prove that all limit point $x^*$ of the sequence $\{x_k\}$ satisfy the condition $g(x^*)=0$, and there exists a point $x^*$ satisfying $g(x^*)=0$ where $H(x^*)$ is positive semidefinite. Furthermore, we show that Algorithm 2 is well-defined, which means that the inner cycle of the algorithm will be left after a finite number internal iterations, and then prove its global convergence. Moreover, local superlinear and quadratic convergence rates are investigated under some classical assumptions.

To prove the global convergence of the sequence $\{x_k\}$ generated by Algorithm 2, we require the following assumptions:

(H1) :

The objective function $f$ is continuously differentiable and has a lower bound on the upper level set $L(x_0) = \{x \in {\mathbb {R}}^n \mid f(x) \le f(x_0)\}$.

(H2) :

The sequence $\{B_k\}$ is uniformly bounded, i.e., there exists a constant $M>0$ such that

$$\begin{aligned} \Vert B_k\Vert \le M, \end{aligned}$$

for all $k \in \mathbb {N}$.

(H3) :

There exists a constant $c>0$ such that the trial step $d_k$ satisfies $\Vert d_k\Vert \le c\Vert g_k\Vert $, for all $k \in \mathbb {N}$.

We also assume that the decrease on the model $q_k$ is at least as much as a fraction of the decrease obtained by the Cauchy point guaranteeing that there exists a constant $\beta \in (0,1)$ such that

$$\begin{aligned} q_k(0)-q_k(d_k)\ge \beta \Vert g_k\Vert ~ \min \left\{ \delta _k, \frac{\Vert g_k\Vert }{\Vert B_k\Vert } \right\} , \end{aligned}$$

(17)

for all $k$. This condition is called the sufficient reduction condition. Inequality (17) implies that $d_k \ne 0$ whenever $g_k \ne 0$. It is noticeable that there are several schemes that can solve the trust-region subproblem (3) such that (17) is valid, see, for example, [14, 33].

Lemma 2

Suppose that the sequence $\{x_k\}$ is generated by Algorithm 2, then

$$\begin{aligned} \big |f_k-f(x_k+d_k)-\big (q_k(0)-q_k(d_k)\big )\big |\le O\big (\Vert d_k\Vert ^2\big ). \end{aligned}$$

Proof

The proof can be found in [13]. $\square $

Lemma 3

Suppose that the sequence $\{x_{k}\}$ is generated by Algorithm 2, then we get

$$\begin{aligned} f_{k}\le T_{k}\le f_{l(k)}, \end{aligned}$$

(18)

for all $k \in \mathbb {N} \cup \{0\}$.

Proof

For $k < N$, we consider two cases: (i) $T_k$ is defined by (14); (ii) $T_k$ is defined by (15). In Case (i) Lemma 2.1 in [3], $f_i \le f_{l(k)}$, for $i = 0, 1, \ldots k $, and the fact that summation of multipliers in $T_k$ equal to one give the result. Case (ii) is evident from (15).

For $k \ge N$, if $T_k = f_k$, the result is evident. Otherwise, since

$$\begin{aligned} \big (1-\eta _{k - 1}\big ) + \eta _{k - 1}\big (1-\eta _{k - 2}\big ) + \cdots + \eta _{k - 1} \cdots \eta _{k - N - 1} \big (1-\eta _{k - N}\big ) + \eta _{k - 1} \cdots \eta _{k - N} = 1, \end{aligned}$$

(19)

the fact that $f_i \le f_{l(k)}$, for $i = k - N + 1, \ldots , k$, and (10) imply

$$\begin{aligned} f_k \le T_k&= \big (1-\eta _{k - 1}\big )~f_k + \eta _{k - 1} \big (1-\eta _{k - 2}\big ) ~ f_{k-1} + \cdots + \eta _{k - 1} \cdots \eta _{k - N} ~ f_{k - N}\\&\le \big [\big (1-\eta _{k - 1}\big ) + \eta _{k - 1} \big (1-\eta _{k - 2}\big ) + \cdots + \eta _{k - 1} \cdots \eta _{k - N}\big ] f_{l(k)} = f_{l(k)}, \end{aligned}$$

giving the result. $\square $

Lemma 4

Suppose that the sequence $\{x_k\}$ is generated by Algorithm 2, then the sequence $\{f_{l(k)}\}$ is decreasing.

Proof

The condition (18) implies that $T_k \le f_{l(k)}$. If $x_{k+1}$ is accepted by Algorithm 2, then

$$\begin{aligned} \frac{f_{l(k)}-f\big (x_k+d_k\big )}{q_k(0)-q_k(d_k)}\ge \frac{T_k-f\big (x_k+d_k\big )}{q_k(0)-q_k(d_k)}\ge \mu _1, \end{aligned}$$

leading to

$$\begin{aligned} f_{l(k)}-f\big (x_k+d_k\big )\ge \mu _1 \big (q_k(0)-q_k(d_k)\big ) \ge 0, \quad \mathrm {for~all} ~k \in \mathbb {N}, \end{aligned}$$

implying

$$\begin{aligned} f_{l(k)}\ge f_{k+1}, \quad \mathrm {for~all} ~k \in \mathbb {N}. \end{aligned}$$

(20)

Now, if $k\ge N$, by using $m(k+1) \le m(k)+1$ and (20), we get

$$\begin{aligned} f_{l(k+1)}= & {} \max _{0\le j\le m(k+1)}\big \{f_{k-j+1}\big \} \le \max _{0\le j\le m(k)+1}\big \{f_{k-j+1}\big \}\\= & {} \max \big \{f_{l(k)},f_{k+1}\big \}\le f_{l(k)}. \end{aligned}$$

For $k<N$, it is obvious that $m(k)=k$. Since, for any $k$, $f_k\le f_0$, it is clear that $f_{l(k)}=f_0$. Therefore, in both cases, the sequence $\{f_{l(k)}\}$ is decreasing. $\square $

Lemma 5

Suppose that (H1) holds and the sequence $\{x_k\}$ is generated by Algorithm 2, then $L(x_0)$ involves $\{x_k\}$.

Proof

The definition of $T_k$ indicates that $T_0=f_0$. By induction, we assume that $x_i \in L(x_0)$, for all $i=1,2,\ldots ,k$, and then prove that $x_{k+1} \in L(x_0)$. From (18), we get

$$\begin{aligned} f_{k+1} \le T_{k+1} \le f_{l(k+1)} \le f_{l(k)} \le f_0, \end{aligned}$$

implying that $L(x_0)$ involves the sequence $\{x_k\}$. $\square $

Corollary 6

Suppose that (H1) holds and the sequence $\{x_k\}$ is generated by Algorithm 2. Then the sequence $\{f_{l(k)}\}$ is convergent.

Proof

The assumption (H1) and Lemma 4 imply that there exists a constant $\lambda $ such that

$$\begin{aligned} \lambda \le f_{k+n}\le f_{l(k+n)}\le \cdots \le f_{l(k+1)}\le f_{l(k)}, \end{aligned}$$

for all $n \in \mathbb {N}$. This implies that the sequence $\big \{f_{l(k)}\big \}$ is convergent. $\square $

Lemma 7

Suppose that (H1)–(H3) hold and the sequence $\{x_k\}$ is generated by Algorithm 2, then

$$\begin{aligned} \lim _{k\rightarrow \infty }f(x_{l(k)})=\lim _{k\rightarrow \infty }f_k. \end{aligned}$$

(21)

Proof

The condition (18) and Lemma 7 of [1] imply that the result is valid. $\square $

Corollary 8

Suppose (H1)–(H3) hold and the sequence $\{x_k\}$ is generated by Algorithm 2, then we have

$$\begin{aligned} \lim _{k\rightarrow \infty }T_k=\lim _{k\rightarrow \infty }f_k. \end{aligned}$$

(22)

Proof

From (18) and Lemma 7, the result is obtained. $\square $

Lemma 9

Suppose that (H1) and (H2) hold, and the sequence $\{x_k\}$ is generated by Algorithm 2. Then if $\Vert g_k\Vert \ge \varepsilon >0$, we have

(i)
The inner cycle is left after a finite number of inner iterations and Algorithm 2 is well-defined;
(ii)
For any $k$, there exists a nonnegative integer $p$ such that $x_{k+p+1}$ is a very successful iteration.

Proof

(i) Let $t_k$ denotes the internal iteration counter in step $k$, and $d_k^{t_k}$ and $\delta _k^{t_k}$ respectively show the solution of the subproblem (3) and the corresponding trust-region radius in the internal iteration $t_k$. The fact that $\Vert g_k\Vert \ge \varepsilon >0$, (H2), and (17) imply

$$\begin{aligned} q_k(0)-q_k\big (d_k^{t_k}\big )\ge \beta \Vert g_k\Vert ~\min \left\{ \delta _k^{t_k}, \frac{\Vert g_k\Vert }{\Vert B_k\Vert } \right\} \ge \beta \varepsilon ~ \min \left\{ \delta _k^{t_k}, \frac{\varepsilon }{M} \right\} . \end{aligned}$$

(23)

Then Line 8 of Algorithm 2 implies

$$\begin{aligned} \lim _{t_k \rightarrow \infty } \delta _k^{t_k}=0. \end{aligned}$$

From This, Lemma 2, and (23), we obtain

$$\begin{aligned} |r_k-1|&=\left| \frac{f_k-f\big (x_k+d_k^{t_k}\big )}{q_k(0)-q_k\big (d_k^{t_k}\big )}-1\right| =\left| \frac{f_k-f\big (x_k+d_k^{t_k}\big )-\big (q_k(0)-q_k\big (d_k^{t_k}\big )\big )}{q_k(0)-q_k\big (d_k^{t_k}\big )}\right| \\&\le \frac{O\big (\Vert d_k^{t_k}\Vert ^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _k^{t_k}, \varepsilon /M \right\} }\le \frac{O\big (\big (\delta _k^{t_k}\big )^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _k^{t_k}, \varepsilon /M \right\} }\rightarrow 0 ~~~ (t_k\rightarrow \infty ), \end{aligned}$$

implying that there exists a positive integer $k_0$ such that for $k\ge k_0$ we have $r_k \ge \mu _1$. This and (18) lead to

$$\begin{aligned} \widehat{r}_k = \frac{T_k-f\big (x_k+d_k^{t_k}\big )}{q_k(0)-q_k\big (d_k^{t_k}\big )} \ge \frac{f_k-f\big (x_k+d_k^{t_k}\big )}{q_k(0)-q_k\big (d_k^{t_k}\big )} \ge \mu _1, \end{aligned}$$

implying the inner cycle is left after a finite number of inner iterations, i.e., Algorithm 2 is well-defined.

(ii) Assume that there exists a positive integer $k$ such that for an arbitrary positive integer $p$ the point $x_{k+p+1}$ is not very successful. Hence, for any constant $p=0,1,2,\ldots $, we get

$$\begin{aligned} \widehat{r}_{k+p}<\mu _2. \end{aligned}$$

The fact that $\Vert g_k\Vert \ge \varepsilon >0$, (H2), and (17) imply

$$\begin{aligned} T_{k+p} - f\big (x_{k+p} + d_{k+p}\big )&\ge \mu _1 \big (q_{k+p}(0)-q_{k+p}\big (d_{k+p}\big )\big )\nonumber \\&\ge \beta \mu _1 \Vert g_{k+p}\Vert ~\min \left\{ \delta _{k+p}, \frac{\Vert g_{k+p}\Vert }{\Vert B_{k+p}\Vert } \right\} \nonumber \\&\ge \beta \mu _1 \varepsilon ~ \min \left\{ \delta _{k+p}, \frac{\varepsilon }{M} \right\} . \end{aligned}$$

(24)

By using (22) and (24), we can write

$$\begin{aligned} \lim _{p \rightarrow \infty } \delta _{k+p}=0. \end{aligned}$$

(25)

From Lemma 2, (25), and (23), we obtain

$$\begin{aligned} |r_{k+p}-1|&=\left| \frac{f\big (x_{k+p}\big )-f\big (x_{k+p}+d_{k+p}\big )}{q_{k+p}(0)-q_{k+p}\big (d_{k+p}\big )}-1\right| \\&=\left| \frac{f\big (x_{k+p}\big )-f\big (x_{k+p}+d_{k+p}\big )-\big (q_{k+p}(0)-q_{k+p} \big (d_{k+p}\big )\big )}{q_{k+p}(0)-q_{k+p}\big (d_{k+p}\big )}\right| \\&\le \frac{O\big (\Vert d_{k+p}\Vert ^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _{k+p}, \varepsilon /M \right\} }\le \frac{O\big (\delta _{k+p}^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _{k+p}, \varepsilon /M \right\} }\rightarrow 0 ~~~ (p\rightarrow \infty ). \end{aligned}$$

Then, for a sufficiently large $p$, we get $r_{k+p} \ge \mu _2$ leading to

$$\begin{aligned} \frac{T_{k+p}-f\big (x_{k+p}+d_{k+p}\big )}{q_{k+p}(0)-q_{k+p}\big (d_{k+p}\big )} \ge \frac{f\big (x_{k+p}\big )-f\big (x_{k+p}+d_{k+p}\big )}{q_{k+p}(0)-q_{k+p}\big (d_{k+p}\big )} \ge \mu _2. \end{aligned}$$

implying $\widehat{r}_{k+p}\ge \mu _2$, for a sufficiently large $p$. This contradicts with assumption $\widehat{r}_{k+p}<\mu _2$ giving the result. $\square $

Lemma 9(i) implies that the inner cycle will be left after a finite number of internal iterations, and Lemma 9(ii) implies that if the current iteration is not a first-order stationary point, then at least there exists a very successful iteration point, i.e., the trust-region radius $\delta _k$ can be enlarged. The next result gives the global convergence of the sequence $\{x_k\}$ of Algorithm 2.

Theorem 10

Suppose that (H1) and (H2) hold, and suppose the sequence $\{x_k\}$ is generated by Algorithm 2. Then

$$\begin{aligned} \liminf _{k \rightarrow \infty } \Vert g_k\Vert =0. \end{aligned}$$

(26)

Proof

We consider two cases: (i) Algorithm 2 has finitely many very successful iterations; (ii) Algorithm 2 has infinitely many very successful iterations.

In Case (i), we suppose that $k_0$ be the largest index of very successful iterations. If $\Vert g_{k_0+1}\Vert >0$, then Lemma 9(ii) implies that there exists a very successful iteration with larger index than $k_0$. This is a contradiction to the definition of $k_0$.

In Case (ii), by contradiction, we assume that there exist constants $\varepsilon >0$ and $K>0$ such that

$$\begin{aligned} \Vert g_k\Vert \ge \varepsilon , \end{aligned}$$

(27)

for all $k\ge K$. If $x_{k+1}$ is a successful iteration and $k \ge K$, then by using (H2), (17), and (27), we get

$$\begin{aligned} T_k-f\big (x_k+d_k\big )&\ge \mu _1\big (q_k(0)-q_k(d_k)\big )\nonumber \\&\ge \beta \mu _1 \Vert g_k\Vert ~ \min \left\{ \delta _k, \frac{\Vert g_k\Vert }{\Vert B_k\Vert }\right\} \ge \beta \mu _1 \varepsilon ~ \min \left\{ \delta _k, \frac{\varepsilon }{M} \right\} \ge 0. \end{aligned}$$

(28)

It follows from this inequality and (22) that

$$\begin{aligned} \lim _{k \rightarrow \infty } \delta _k=0. \end{aligned}$$

(29)

Then (H2), (17), and (27) imply

$$\begin{aligned} |r_k-1|&=\left| \frac{f_k-f\big (x_k+d_k\big )}{q_k(0) -q_k(d_k)}-1\right| =\left| \frac{f_k-f\big (x_k+d_k\big ) -\big (q_k(0)-q_k(d_k)\big )}{q_k(0)-q_k(d_k)}\right| \\&\le \frac{O\big (\Vert d_k\Vert ^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _k,\varepsilon /M \right\} } \le \frac{O(\delta ^2)}{\beta \varepsilon ~ \min \left\{ \delta _k,\varepsilon /M \right\} } \rightarrow 0 ~~~ (k\rightarrow \infty ). \end{aligned}$$

This clearly implies that there exists a positive integer $k_1$ such that for $k \ge k_1$ we have

$$\begin{aligned} \widehat{r}_k = \frac{T_k-f\big (x_k+d_k\big )}{q_k(0)-q_k(d_k)} \ge \frac{f_k-f\big (x_k+d_k\big )}{q_k(0)-q_k(d_k)} \ge \mu _1, \end{aligned}$$

implying $x_{k+1} = x_k + d_k$ for $k \ge k_1$, i.e., the trust-region radius remains fixed ($\mu _1 \le \widehat{r}_k < \mu _2$) or will be increased ($\widehat{r}_k \ge \mu _2$). Now we construct a subsequence of very successful iterations for $k \ge k_1$. Form Lemma 9(ii) there exists an integer $p_1$ such that $x_{k_1+p_1}$ is very successful. Then we set $i_1:=k_1+p_1$. Now, from Lemma 9(ii), there exists an integer $p_2$ such that $x_{i_1+p_2}$ is very successful, and so we set $i_2:=i_1+p_2$. By continuing this procedure we get the index set

$$\begin{aligned} \mathcal {I} := \{i_1, i_2, i_3:= i_2+p_3, \cdots \} \end{aligned}$$

such that $\{x_{\mathcal {I}}\}$ is the subsequence of very successful iterations in which the trust-region radius is increased. Hence, for $k \ge k_1$, the trust-region radius is fixed or increased in each iteration,which is a contradiction with (29). This implies the result is valid. $\square $

Theorem 11

Suppose that (H1) and (H2) hold, and the sequence $\{x_k\}$ is generated by Algorithm 2. Then

$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert g_k\Vert =0. \end{aligned}$$

(30)

Proof

By contradiction, we assume $\lim _{k \rightarrow \infty } \Vert g_k\Vert \ne 0$. Hence there exists $\varepsilon >0$ and an infinite subsequence of $\{x_k\}$, indexed by $\{t_i\}$, such that

$$\begin{aligned} \Vert g_{t_i}\Vert \ge 2\varepsilon >0, \end{aligned}$$

(31)

for all $i \in \mathbb {N}$. Theorem 10 ensures the existence, for each $t_i$, a first successful iteration $r(t_i)>t_i$ such that $\Vert g_{r(t_i)}\Vert < \varepsilon $. We denote $r_i=r(t_i)$. Hence there exists another subsequence, indexed by $\{r_i\}$, such that

$$\begin{aligned} \Vert g_k\Vert \ge \varepsilon ~~ \mathrm {for}~~ t_i \le k < r_i,\quad \Vert g_{r_i}\Vert <\varepsilon . \end{aligned}$$

(32)

We now restrict our attention to the sequence of successful iterations whose indices are in the set

$$\begin{aligned} \kappa = \big \{k \in \mathbb {N}\ |\ t_i \le k < r_i \big \}. \end{aligned}$$

Using (32), for every $k \in \kappa $, (28) holds. It follows from (22) and (28) that

$$\begin{aligned} \lim _{k \rightarrow \infty } \delta _k=0, \end{aligned}$$

(33)

for $k \in \kappa $. Now, using (H2), (17), and $\Vert g_k\Vert \ge \varepsilon $, the condition (23) holds, for $k \in \kappa $. This, Lemma 2, and (33) lead to

$$\begin{aligned} |r_k-1|&=\left| \frac{f_k-f\big (x_k+d_k\big )}{q_k(0)-q_k(d_k)} -1\right| =\left| \frac{f_k-f(x_k+d_k)-\big (q_k(0)-q_k\big (d_k\big )\big )}{q_k(0)-q_k(d_k)}\right| \\&\le \frac{O\big (\Vert d_k\Vert ^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _k, \varepsilon /M \right\} }\le \frac{O\big (\delta _k^2\big )}{\beta \varepsilon \delta _k }\rightarrow 0 ~~~ (k\rightarrow \infty ,~ k\in \kappa ). \end{aligned}$$

Thus, for a sufficiently large $k+1 \in \kappa $, we get

$$\begin{aligned} f_k-f\big (x_k+d_k\big )&\ge \mu _1\big (q_k(0)-q_k(d_k)\big )\nonumber \\&\ge \beta \mu _1 \Vert g_k\Vert \min \left\{ \delta _k, \frac{\Vert g_k\Vert }{\Vert B_k\Vert }\right\} \ge \beta \mu _1 \varepsilon ~ \min \left\{ \delta _k, \frac{\varepsilon }{M} \right\} . \end{aligned}$$

(34)

The condition (33) implies that $\delta _k \le \varepsilon /M$. Hence, for a sufficiently large $k \in \kappa $, we obtain

$$\begin{aligned} \delta _k \le \frac{1}{\beta \mu _1}\big (f_k-f_{k+1}\big ). \end{aligned}$$

(35)

Then (18) and (35) imply

$$\begin{aligned} \big \Vert x_{t_i}-x_{r_i}\big \Vert \le \sum _{j \in \kappa , j=t_i}^{r_i-1}\big \Vert x_{j}-x_{j+1}\big \Vert \le \sum _{j \in \kappa , j=t_i}^{r_i-1} \delta _j \le \frac{1}{\beta \mu _1}\big (f_{t_i}-f_{r_i}\big ) \le \frac{1}{\beta \mu _1}\big (T_{t_i}-f_{r_i}\big ), \end{aligned}$$

(36)

for a sufficiently large $i$. Now, Corollary 8 implies

$$\begin{aligned} 0 \le \lim _{i \rightarrow \infty } \big \Vert x_{t_i}-x_{r_i}\big \Vert \le \lim _{i \rightarrow \infty } \frac{1}{\beta \mu _1}\big (T_{t_i}-f_{r_i}\big ) = 0, \end{aligned}$$

leading to

$$\begin{aligned} \lim _{i \rightarrow \infty } \big \Vert x_{t_i}-x_{r_i}\big \Vert = 0. \end{aligned}$$

Since the gradient is continuous, we get

$$\begin{aligned} \lim _{i \rightarrow \infty } \big \Vert g_{t_i}-g_{r_i}\big \Vert =0. \end{aligned}$$

(37)

In view of the definitions of $\{t_i\}$ and $\{r_i\}$, it is impossible, guaranteeing $\Vert g_{t_i}-g_{r_i}\Vert \ge \varepsilon $. Therefore, there is no subsequence that satisfies (31) giving the result. $\square $

Note that, in view of the theorem in Page 709 of [28], there is no limit point of the sequence $\{x_k\}$ to be a local maximizer of $f$.

The next result gives the global convergence of the sequence generated by Algorithm 2 to second-order stationary points. To this end, similar to [16], an additional assumption is needed:

(H4) If $\lambda _{\min }(B_k)$ represents the smallest eigenvalue of the symmetric matrix $B_k$, then there exists a positive scalar $c_3$ such that

$$\begin{aligned} q_k(0)-q_k(d_k) \ge c_3 \lambda _{\min }(B_k) \delta ^2. \end{aligned}$$

Theorem 12

Suppose that $f$ is twice continuously differentiable and also suppose that (H1)–(H4) hold. Then there exists a limit point $x^*$ of the sequence $\{x_k\}$ generated by Algorithm 2 such that $\nabla ^2 f(x^*)$ is positive semidefinite.

Proof

The proof is similar to Theorem 3.4 in [16]. $\square $

The next two results show that Algorithm 2 can be reduced to quasi-Newton or Newton methods, where the sequence $\{x_k\}$ generated by these schemes can attain local superlinear and quadratic convergence rates under some conditions, respectively.

Theorem 13

Suppose that (H1)–(H3) hold, and also suppose that the sequence $\{x_k\}$ generated by Algorithm 2 converges to $x^*$, $\Vert d_k\Vert =\Vert -B_k^{-1}g_k\Vert \le \delta _k$, $H(x)=\nabla ^2 f(x)$ is continuous in a neighborhood $N(x^*,\varepsilon )$ of $x^*$, and $B_k$ satisfies

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{\big \Vert \big [B_k-H(x^*)\big ]d_k\big \Vert }{\Vert d_k\Vert }=0. \end{aligned}$$

(38)

then

(i)
There exists a constant $k_1$ such that for all $k \ge k_1$ we have $x_{k+1} = x_k + d_k$;
(ii)
The sequence $\{x_k\}$ generated by Algorithm 2 converges to $x^*$ superlinearly.

Proof

(i) The condition (38) implies

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\big \Vert g_k+H(x^*)d_k\big \Vert }{\big \Vert d_k\big \Vert }=0, \end{aligned}$$

(39)

leading to

$$\begin{aligned} d_k=-H(x^*)^{-1} g_k+o\big (\Vert d_k\Vert \big ). \end{aligned}$$

This implies that

$$\begin{aligned} \big \Vert d_k\big \Vert \le \big \Vert H(x^*)^{-1}\big \Vert ~ \big \Vert g_k\big \Vert +o\big (\Vert d_k\Vert \big ). \end{aligned}$$

(40)

Theorem 11 implies that $\Vert g_k\Vert \rightarrow 0$, as $k\rightarrow \infty $. This and (40) give

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert d_k\Vert =0. \end{aligned}$$

(41)

This, (18), and (H2) imply

$$\begin{aligned} |r_k-1|&=\left| \frac{f_k-f\big (x_k+d_k\big )}{q_k(0)-q_k(d_k)} -1\right| =\left| \frac{f_k-f\big (x_k+d_k\big )-\big (q_k(0)-q_k(d_k)\big )}{q_k(0) -q_k(d_k)}\right| \\&\le \frac{O\big (\Vert d_k\Vert ^2\big )}{\beta \varepsilon ~ \min \left\{ \delta _k,\varepsilon /M \right\} } \le \frac{O\big (\Vert d_k\Vert ^2\big )}{\beta \varepsilon ~ \min \left\{ \Vert d_k\Vert ,\varepsilon /M \right\} } \rightarrow 0 ~~~ (k\rightarrow \infty ). \end{aligned}$$

This clearly implies that there exists a positive integer $k_1$ such that for $k \ge k_1$ we have $x_{k+1} = x_k + d_k$.

(ii) From $d_k = - B_k^{-1} g_k$, we obtain

$$\begin{aligned} \frac{\big \Vert g_k+H_k d_k\big \Vert }{\big \Vert d_k\big \Vert } = \frac{\big \Vert \big [H_k- B_k\big ]d_k\big \Vert }{\big \Vert d_k\big \Vert } \le \frac{\big \Vert \big [H_k-H(x^*)\big ]d_k\big \Vert }{\big \Vert d_k\big \Vert } + \frac{\big \Vert \big [B_k-H(x^*)\big ]d_k\big \Vert }{\big \Vert d_k\big \Vert }. \end{aligned}$$

This and (30) lead to

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{\big \Vert g_k+H_k d_k\big \Vert }{\big \Vert d_k\big \Vert }=0. \end{aligned}$$

(42)

Now Theorem 3.6 in [33] implies that $\{x_k\}$ generated by Algorithm 2 converges to $x^*$ superlinearly. $\square $

Notice that if $f$ is thrice continuously differentiable and the upper level set $L(x_0)$ is bounded, then (H1) implies that $\Vert \nabla ^3f(x)\Vert $ is uniformly continuous and bounded on the open bounded convex set $\Omega $ involving $L(x_0)$. Hence, by using the mean value theorem, there exists a constant $L>0$ such that $\Vert \nabla ^3f(x)\Vert \le L$ implying

$$\begin{aligned} \big \Vert H(x)-H(y)\big \Vert \le L\big \Vert x-y\big \Vert , \end{aligned}$$

(43)

for all $x, y \in \Omega $. This implies that Hessian of $f$ is Lipschitz continuous. This condition can guarantee the quadratic convergence of the sequence $\{x_k\}$ generated by Algorithm 2. The details are summarized in the next result.

Theorem 14

Suppose that $f(x)$ is a twice continuously differentiable function on ${\mathbb {R}}^n$, and all assumptions of Theorem 11 hold. If $\Vert d_k\Vert = \Vert -H_k^{-1}g_k\Vert \le \delta _k$, and there exists a neighborhood $N(x^*,\epsilon )$ of $x^*$ such that $H(x)$ is Lipschitz continuous on $N(x^*,\epsilon )$, i.e., there exists $L$ such that

$$\begin{aligned} \big \Vert H(x)-H(y)\big \Vert \le L \big \Vert x-y\big \Vert , \end{aligned}$$

(44)

then

(i)
There exists a constant $k_2$ such that for all $k \ge k_2$ we have $x_{k+1} = x_k + d_k$;
(ii)
The sequence $\{x_k\}$ generated by Algorithm 2 converges to $x^*$ quadratically.

Proof

(i)
By replacing $B_k$ by $H_k$ in Theorem 13, we obtain that there exists an integer $k_2>0$ such that
$$\begin{aligned} x_{k+1} = x_k - H_k^{-1}g_k, \end{aligned}$$
for all $k \ge k_1$.
(ii)
The condition described in (i) and Theorem 3.5 in [33] give the results.

$\square $

4 Numerical experiments

In this section we report numerical results for Algorithm 2 equipped with two novel nonmonotone terms proposed in Sect. 2 for solving unconstrained optimization problems. In our experiments we use several version of Algorithm 2 employing state-of-the-art nonmonotone terms. In details, we consider

NMTR-G: Algorithm 2 with the nonmonotone term of Grippo et al. [28];
NMTR-H: Algorithm 2 with the nonmonotone term of Zhang and Hager [47];
NMTR-N: Algorithm 2 with the nonmonotone term of Amini et al. [8];
NMTR-M: Algorithm 2 with the nonmonotone term of Ahookhosh et al. [3];
NMTR-1: Algorithm 2 with the nonmonotone term (14);
NMTR-2: Algorithm 2 with the nonmonotone term (15).

In the experiments we used 112 test problems of the CUTEst test collections [25] from dimension 2 to 5000, where we ignore test problems with the dimension greater than 5000. All of the codes are written in MATLAB using the same subroutine, and they are tested on 2Hz core i5 processor laptop with 4GB of RAM with the double-precision data type. The initial points are standard ones proposed in CUTEst. All the algorithms use the radius

$$\begin{aligned} \delta _{k+1}= \left\{ \begin{array}{ll} c_1\Vert d_k\Vert &{} \quad {\mathrm {if}}~ {\widehat{r}}_k < \mu _1,\\ \delta _k&{} \quad {\mathrm {if}}~ \mu _1 \le {\widehat{r}}_k < \mu _2, \\ \max \{\delta _k,c_2 \Vert d_k\Vert \} &{} \quad {\mathrm {if}}~ {\widehat{r}}_k \ge \mu _2, \\ \end{array} \right. \end{aligned}$$

where

$$\begin{aligned} \mu _{1}=0.05,\;\mu _{2} = 0.9,\; c_1=0.25,\; c_2=2.5,\; \delta _{0} =0.1\Vert g_k \Vert , \end{aligned}$$

see [27]. In the model $q_k$ (2), an approximation for Hessian is generated by the BFGS updating formula

$$\begin{aligned} B_{k+1}=B_k + \frac{y_k y_k^T}{s_k^T y_k}-\frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}, \end{aligned}$$

where $s_k=x_{k+1}-x_k$ and $y_k=g_{k+1}-g_k$. For NMTR-G, NMTR-N, NMTR-1 and NMTR-2, we set $N = 10$. As discussed in [47], NMLS-H uses $\eta _k = 0.85$. On the basis of our experiments, we update the parameter $\eta _k$ by

$$\begin{aligned} \eta _{k}=\left\{ \begin{array}{ll} \eta _0/2 \qquad &{} \quad {\mathrm {if}}~ k=1, \\ (\eta _{k-1}+\eta _{k-2})/2 &{} \quad {\mathrm {if}}~ k\ge 2, \end{array} \right. \end{aligned}$$

for NMTR-N, NMTR-M, NMTR-1 and NMTR-2, where the parameter $\eta _0$ will be tuned to get a better performance. To solve the quadratic subproblem (3), we use the Steihaug-Toint scheme [14] (Chapter 7, Page 205) where the scheme is terminated if

$$\begin{aligned} \Vert g(x_k+d)\Vert \le \min \left\{ 1/10, \Vert g_k\Vert ^{1/2}\right\} \Vert g_k\Vert ~~~ \text {or}~~~ \Vert d\Vert =\delta _k. \end{aligned}$$

In our experiments the algorithms are stopped whenever the total number of iterations exceeds 10,000 or

$$\begin{aligned} \Vert g_k\Vert < \varepsilon \end{aligned}$$

(45)

holds with the accuracy parameter $\varepsilon = 10^{-5}$.

To compare the results appropriately, we use the performance profiles of Dolan and Moré in [19], where the measures of performance are the number of iterations ($N_i$), the number of function evaluations ($N_f$), and the number of gradient evaluations ($N_g$). In the algorithms considered, the number of iterations and gradient evaluations are the same, so we only consider the performance of gradients. It is believed that the computational cost of a gradient is as much as the computational cost three function values, i.e., we further consider the measure $N_f+3 N_g$. The performance of each code is measured by considering the ratio of its computational outcome versus the best numerical outcome of all codes. This profile offers a tool for comparing the performance of iterative schemes in a statistical structure. Let $\mathcal {S}$ be a set of all algorithms and $\mathcal {P}$ be a set of test problems. For each problem $p$ and solver $s$, $t_{p,s}$ is the computational outcome regarding to the performance index, which is used in defining the performance ratio

$$\begin{aligned} r_{p,s}=\frac{t_{p,s}}{\min \big \{ t_{p,s}: s\in \mathcal {S}\big \}}. \end{aligned}$$

(46)

If an algorithm $s$ is failed to solve a problem $p$, the procedure sets $r_{p,s}=r_{\mathrm {failed}}$, where $r_{\mathrm {failed}}$ should be strictly larger than any performance ratio (46). For any factor $\tau $, the overall performance of an algorithm $s$ is given by

$$\begin{aligned} \rho _s(\tau )=\frac{1}{n_p} \text {size}\big \{p \in \mathcal {P}: r_{p,s} \le \tau \big \}. \end{aligned}$$

In fact $\rho _s(\tau )$ is the probability that a performance ratio $r_{p,s}$ of the algorithm $s\in \mathcal {S}$ is within a factor $\tau \in {\mathbb {R}}^n$ of the best possible ratio. The function $\rho _s(\tau )$ is a distribution function for the performance ratio. In particular, $\rho _s(1)$ gives the probability that the algorithm $s$ wins over all other considered algorithms, and $\lim _{\tau \rightarrow r_{\mathrm {failed}}}\rho _s(\tau )$ gives the probability of that the algorithm $s$ solve all considered problems. Hence the performance profile can be considered as a measure of efficiency for comparing iterative schemes. In Figs. 3 and 4, the x-axis shows the number $\tau $ while the y-axis inhibits $P(r_{p,s} \le \tau : 1 \le s \le n_s)$.

4.1 Experiments with highly nonlinear problems

In this subsection we give some numerical results regarding the implementation of NMTR-1 and NMTR-2 compared with TTR on some two-dimensional highly nonlinear problems involving a curved narrow valley. More precisely, we consider the Nesterov–Chebyshev–Rosenbrock, Maratos, and NONDIA functions, see, for example, [9]. In Example 1 the Nesterov–Chebyshev–Rosenbrock function is given, and the Maratos and NONDIA functions are given by

$$\begin{aligned} \begin{array}{l} \displaystyle f(x_1,x_2) =x_1 + \theta _1 \big ( x_1^2 + x_2^2 - 1 \big )^ 2 ~~~ (\mathrm {Maratos~ function})\\ \end{array} \end{aligned}$$

and

$$\begin{aligned} \displaystyle f(x_1,x_2) = \big (1-x_2\big )^2 + \theta _2\big ( x_1 - x_2^2\big )^2 ~~~(\mathrm {NONDIA~ function}), \end{aligned}$$

respectively, where we consider $\theta _1 = 10$ and $\theta _2 = 100$.

We solve the problem (1) for these three functions using TTR, NMTR-1, and NMTR-2, and the results regarding the number of iterations and function evaluations are summarized in Table 1. To give a clear view of the behaviour of TTR, NMTR-1, and NMTR-2, we depict the contour plot of the considered functions and iterations obtained by the algorithms in Fig. 2a, c, and e. In all three cases, one can see that NMTR-1 and NMTR-2 need less iterations and function values compared with TTR to solve the problem. Moreover, TTR behaves monotonically and follows the bottom of the associated valley, while NMTR-1 and NMTR-2 fluctuated in the valley.

Table 1 Numerical results for highly nonlinear problems

Full size table

4.2 Experiments with CUTEst test problems

In this subsection we give numerical results regarding experiments with NMTR-1 and NMTR-2 on the CUTEst test problems (Table 2) compared with NMTR-G, NMTR-H, NMTR-N, and NMTR-M.

To get a better performance from NMTR-1 and NMTR-2, we tune the parameter $\eta _0$ by testing several fixed values of $\eta _0$ for both algorithms, where we use $\eta _0=0.15, 0.25, 0.35, 0.45$. The corresponding versions of the algorithms NMTR-1 and NMTR-2 are denoted by NMTR-1-0.15, NMTR-1-0.25, NMTR-1-0.35, NMTR-1-0.45, NMTR-2-0.15, NMTR-2-0.25, NMTR-2-0.35, and NMTR-2-0.45, respectively. The results are summarized in Fig. 3 for three measures: the number function evaluations; the number gradient evaluations; the mixed measure $N_f + 3 N_g$. In Fig. 3a, c and e illustrate that the results of NMTR-1, where it produces the best results with $\eta _{0} = 0.25$. From Fig. 3b, d, and f of , it can be seen that the best results are produced by $\eta _{0}=0.45$. Hence for NMTR-1 we use $\eta _{0} = 0.25$ and for NMTR-2 use $\eta _{0} = 0.45$ in the remainder of our experiments.

We here test NMTR-G, NMTR-H, NMTR-N, NMTR-M, NMTR-1, and NMTR-2 for solving the unconstrained problem (1) and compare the produced results. We illustrate the results of comparison in Fig. 4 by using performance profiles for the measures $N_g$, $N_f$, and $N_f+3 N_g$. In Fig. 4a displays for the number of gradient evaluations, where the best results attained by NMTR-2 and then by NMTR-N with about 63 and 52 % of the most wins, respectively. NMTR-1 is comparable with NMTR-G, NMTR-H, NMTR-N, but its diagram grows up faster than the others, which means its performance is close to the performance of the best method NMTR-2. Figure 4b shows for the number of function evaluations and has a similar interpretation of Fig. 4a, however, NMTR-2 attains about 60 % of the most wins. In Fig. 4c and d display for the mixed measure $N_f+3 N_g$ with $\tau = 1.5$ and $\tau = 5.5$, respectively. In this case NMTR-2 outperforms the others by attaining about 58 % of the most wins, and the others have comparable results, however, the diagrams of NMTR-1 and NMTR-M grow up faster than the others implying that they perform close to the best algorithm NMTR-2.

5 Concluding remarks

In this paper we give some motivation for employing nonmonotone strategies in trust-region frameworks. Then we introduce two new nonmonotone terms and combine them into the traditional trust-region framework. It is shown that the proposed methods are golbally convergent to first- and second-order stationary points. Moreover local superlinear and quadratic convergence are established. Applying these methods on some highly nonlinear test problems involving a curved narrow valley show that they have a promising behaviour compared with the monotone trust-region method. Numerical experiments on a set of test problems from the CUTEst test collection show the efficiency of the proposed nonmonotone methods.

Further research can be done in several aspects. For example, by combining the proposed nonmonotone trust-region methods with various adaptive radius, more efficient trust-region schemes can be derived, see, for example, [2, 7]. The combination of the proposed nonmonotone terms with several inexact line searches such as Armijo, Wolfe, and Goldstein is also interesting, see [6, 8]. The extension of the proposed method for constrained nonlinear optimization could be interesting, especially for nonnegativity constraints and box constraints, see, for example, [10–12, 34, 42, 43]. It also could be interesting to employ nonmonotone schemes for solving nonlinear least squares and system of nonlinear equations, see [5] and references therein. Moreover, investigating new adaptive formulas for the parameter $\eta _k$ can be precious to improve the computational efficiency.

References

Ahookhosh, M., Amini, K.: An efficient nonmonotone trust-region method for unconstrained optimization. Numer. Algorithms 59(4), 523–540 (2012)
Article MathSciNet MATH Google Scholar
Ahookhosh, M., Amini, K.: A nonmonotone trust region method with adaptive radius for unconstrained optimization problems. Comput. Math. Appl. 60(3), 411–422 (2010)
Article MathSciNet MATH Google Scholar
Ahookhosh, M., Amini, K., Bahrami, S.: A class of nonmonotone Armijo-type line search method for unconstrained optimization. Optimization 61(4), 387–404 (2012)
Article MathSciNet MATH Google Scholar
Ahookhosh, M., Amini, K., Peyghami, M.R.: A nonmonotone trust-region line search method for large-scale unconstrained optimization. Appl. Math. Model. 36(1), 478–487 (2012)
Article MathSciNet MATH Google Scholar
Ahookhosh, M., Esmaeili, H., Kimiaei, M.: An effective trust-region-based approach for symmetric nonlinear systems. Int. J. Comput. Math. 90(3), 671–690 (2013)
Article MATH Google Scholar
Ahookhosh, M., Ghederi, S.: On efficiency of nonmonotone Armijo-type line searches. http://arxiv.org/pdf/1408.2675 (2015)
Amini, K., Ahookhosh, M.: A hybrid of adjustable trust-region and nonmonotone algorithms for unconstrained optimization. Appl. Math. Model. 38, 2601–2612 (2014)
Article MathSciNet Google Scholar
Amini, K., Ahookhosh, M., Nosratipour, H.: An inexact line search approach using modified nonmonotone strategy for unconstrained optimization. Numer. Algorithms 66, 49–78 (2014)
Article MathSciNet MATH Google Scholar
Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147–161 (2008)
MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Article MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23, 539–559 (2003)
Article MathSciNet MATH Google Scholar
Bonnans, J.F., Panier, E., Tits, A., Zhou, J.L.: Avoiding the Maratos effect by means of a nonmonotone line search, II: inequality constrained problems—feasible iterates. SIAM J. Numer. Anal. 29, 1187–1202 (1992)
Article MathSciNet MATH Google Scholar
Chamberlain, R.M., Powell, M.J.D., Lemarechal, C., Pedersen, H.C.: The watchdog technique for forcing convergence in algorithm for constrained optimization. Math. Progr. Study 16, 1–17 (1982)
Article MathSciNet MATH Google Scholar
Conn, A.R., Gould, N.I.M., Toint, PhL: Trust-Region Methods, Society for Industrial and Applied Mathematics. SIAM, Philadelphia (2000)
Book Google Scholar
Davidon, W.C.: Conic approximation and collinear scaling for optimizers. SIAM J. Numer. Anal. 17, 268–281 (1980)
Article MathSciNet MATH Google Scholar
Deng, N.Y., Xiao, Y., Zhou, F.J.: Nonmonotonic trust region algorithm. J. Optim. Theory Appl. 76, 259–285 (1993)
Article MathSciNet MATH Google Scholar
Dennis, J.E., Li, S.B., Tapia, R.A.: A unified approach to global convergence of trust region methods for nonsmooth optimization. Math. Progr. 68, 319–346 (1995)
MathSciNet MATH Google Scholar
Di, S., Sun, W.: Trust region method for conic model to solve unconstrained optimization problems. Optim. Methods Softw. 6, 237–263 (1996)
Article Google Scholar
Dolan, E., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Progr. 91, 201–213 (2002)
Article MATH Google Scholar
Erway, J.B., Gill, P.E.: A subspace minimization method for the trust-region step. SIAM J. Optim. 20(3), 1439–1461 (2009)
Article MathSciNet MATH Google Scholar
Erway, J.B., Gill, P.E., Griffin, J.D.: Iterative methods for finding a trust-region step. SIAM J. Optim. 20(2), 1110–1131 (2009)
Article MathSciNet MATH Google Scholar
Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, New York (1987)
MATH Google Scholar
Ferreira, P.S., Karas, E.W., Sachine, M.: A globally convergent trust-region algorithm for unconstrained derivative-free optimization. Comput. Appl. Math. (2014). doi:10.1007/s40314-014-0167-2
Grapiglia, G.N., Yuan, J., Yuan, Y.: A derivative-free trust-region algorithm for composite nonsmooth optimization. Comput. Appl. Math. (2014). doi:10.1007/s40314-014-0201-4
Gould, N.I.M., Orban, D., Toint, P.H.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. (2014). doi:10.1007/s10589-014-9687-3
Gould, N.I.M., Lucidi, S., Roma, M., Toint, PhL: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Orban, D., Sartenaer, A., Toint, PhL: Sensitivity of trust-region algorithms to their parameters. Q. J. Oper. Res. 3, 227–241 (2005)
Article MathSciNet MATH Google Scholar
Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)
Article MathSciNet MATH Google Scholar
Gürbüzbalaban, M., Overton, M.L.: On Nesterov’s nonsmooth Chebyshev–Rosenbrock functions. Nonlinear Anal. 75, 1282–1289 (2012)
Article MathSciNet MATH Google Scholar
Hallabi, M.E., Tapia, R.: A global convergence theory for arbitraty norm trust region methods for nonlinear equations. Mathematical Sciences Technical Report, TR 87–25, Rice University, Houston, TX (1987)
Mo, J., Liu, C., Yan, S.: A nonmonotone trust region method based on nonincreasing technique of weighted average of the succesive function value. J. Comput. Appl. Math. 209, 97–108 (2007)
Article MathSciNet MATH Google Scholar
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)
Article MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, NewYork (2006)
MATH Google Scholar
Panier, E.R., Tits, A.L.: Avoiding the Maratos effect by means of a nonmonotone linesearch. SIAM J. Numer. Anal. 28, 1183–1195 (1991)
Article MathSciNet MATH Google Scholar
Rojas, M., Sorensen, D.C.: A trust-region approach to the regularization of large-scale discrete forms of ill-posed problems. SIAM J. Sci. Comput. 26, 1842–1860 (2002)
Article MathSciNet Google Scholar
Schultz, G.A., Schnabel, R.B., Byrd, R.H.: A family of trust-region-based algorithms for unconstrained minimization with strong global convergence. SIAM J. Numer. Anal. 22, 47–67 (1985)
Article MathSciNet Google Scholar
Sorensen, D.C.: The $q$-superlinear convergence of a collinear scaling algorithm for unconstrained optimization. SIAM J. Numer. Anal. 17, 84–114 (1980)
Article MathSciNet MATH Google Scholar
Sorensen, D.C.: Newton’s method with a model trust region modification. SIAM J. Numer. Anal. 19(2), 409–426 (1982)
Article MathSciNet MATH Google Scholar
Steihaug, T.: The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 20(3), 626–637 (1983)
Article MathSciNet MATH Google Scholar
Sun, W.: On nonquadratic model optimization methods. Asia Pac. J. Oper. Res. 13, 43–63 (1996)
MATH Google Scholar
Toint, Ph.L: An assessment of nonmonotone linesearch technique for unconstrained optimization. SIAM J. Sci. Comput. 17, 725–739 (1996)
Toint, Ph.L: Non-monotone trust-region algorithm for nonlinear optimization subject to convex constraints. Math. Progr. 77, 69–94 (1997)
Ulbrich, M., Ulbrich, S.: Nonmonotone trust region methods for nonlinear equality constrained optimization without a penalty function. Math. Progr. 95, 103–135 (2003)
Article MathSciNet MATH Google Scholar
Xiao, Y., Zhou, F.J.: Nonmonotone trust region methods with curvilinear path in unconstrained optimization. Computing 48, 303–317 (1992)
Article MathSciNet MATH Google Scholar
Xiao, Y., Chu, E.K.W.: Nonmonotone trust region methods, Technical Report 95/17. Monash University, Clayton, Australia (1995)
Yuan, Y.: Conditions for convergence of trust region algorithms for nonsmooth optimization. Math. Progr. 31, 220–228 (1985)
Article MATH Google Scholar
Zhang, H.C., Hager, W.W.: A nonmonotone line search technique for unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
Article MathSciNet MATH Google Scholar
Zhou, F., Xiao, Y.: A class of nonmonotone stabilization trust region methods. Computing 53(2), 119–136 (1994)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank three anonymous referees for giving many useful suggestions that improves the paper.

Author information

Authors and Affiliations

Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090, Vienna, Austria
Masoud Ahookhosh
Department of Mathematics, Faculty of Science, Razi University, Kermanshah, Iran
Susan Ghaderi

Authors

Masoud Ahookhosh
View author publications
You can also search for this author in PubMed Google Scholar
Susan Ghaderi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masoud Ahookhosh.

Appendix

See Table 2.

Table 2 The list of test problems and associated dimensions

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahookhosh, M., Ghaderi, S. Two globally convergent nonmonotone trust-region methods for unconstrained optimization. J. Appl. Math. Comput. 50, 529–555 (2016). https://doi.org/10.1007/s12190-015-0883-9

Download citation

Received: 15 January 2015
Published: 26 March 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s12190-015-0883-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Abstract

Similar content being viewed by others

An accelerated nonmonotone trust region method with adaptive trust region for unconstrained optimization

A New Restarting Adaptive Trust-Region Method for Unconstrained Optimization

A new simple model trust-region method with generalized Barzilai-Borwein parameter for large-scale optimization

1 Introduction

1.1 Motivation and history

Example 1

1.2 Content

2 Novel nonmonotone terms and algorithm

3 Convergence analysis

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Corollary 6

Proof

Lemma 7

Proof

Corollary 8

Proof

Lemma 9

Proof

Theorem 10

Proof

Theorem 11

Proof

Theorem 12

Proof

Theorem 13

Proof

Theorem 14

Proof

4 Numerical experiments

4.1 Experiments with highly nonlinear problems

4.2 Experiments with CUTEst test problems

5 Concluding remarks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation