1 Introduction

In the present article, methods for solving multiextremal problems widespread in the real life applications are considered. The increasing complexity of mathematical models for more adequate description of objects, phenomena, and systems under study results in a significant increase of the complexity in analysis of these models. Numerical experiments become the primary tool for such an analysis. The increasing complexity of the problems results in the necessity of a purposeful selection of the variants in the process of the optimal solution search. The essence of the purposeful selection consists in eliminating from further consideration many obviously unpromising cases on the base of the analysis of a small portion of the variants in order to concentrate further search in the subdomains containing the best variants. However, the development of the efficient optimization algorithms in the multidimensional case is labor-consuming often. A possible way in this situation consists in applying different schemes of dimensionality reduction for the development of the global search methods.

The multidimensional multiextremal (global) optimization problem without constraints can be defined as a problem of finding the minimal value of a real function \(\varphi (x)\)

$$\begin{aligned} \varphi (y^*) = \min \{ \varphi (y): y \in D \}, \end{aligned}$$
(1)

and its global minimizer \(y^* \in D\), where

$$\begin{aligned} D = \{ y \in \mathbbm {R}^N : y_i \in [a_i, b_i], \quad 1 \le i \le N \} \end{aligned}$$
(2)

is the search domain, which is a hyperparallelepiped in the N-dimensional Euclidean space \(\mathbbm {R}^N\).

The global optimization problems are the subject of the extensive research (see, for example, comprehensive bibliography in [4, 5, 1416, 2224, 33, 3639]). At the same time, these problems are very computation-consuming. Thus, if for the solution search the method of full scanning the nodes of a uniform grid is used with the accuracy \(\varepsilon > 0\) for each coordinate, the total number of the grid nodes in the search domain can be estimated as

$$\begin{aligned} K = K(\varepsilon ) \approx \prod \limits _{j = 1}^N \big [ (b_j - a_j) / \varepsilon \big ]. \end{aligned}$$

The decrease of the computational costs is possible only thanks to the highly efficient global search algorithms, which generate the dense grids in the vicinity of the sought minimum of the minimized function only. The design of such efficient algorithms directly for the multidimensional case is a rather complex problem. As a result, using for this goal various dimensionality reduction schemes is one of the directions in the field of global optimization. Within the framework of this direction, the following three main approaches could be outlined:

The first dimensionality reduction method is based on the well-known fundamental property, according to which an N-hyperparallelepiped from (2) and the interval of the real axis [0, 1] are the sets of equal cardinality, and the interval [0, 1] can be mapped onto the hyperparallelepiped D unambiguously and continuously (see [1, 30, 36]). The mappings of this kind are usually referred to as Peano evolvents or curves.

Assume \(y(x), x \in [0, 1]\), to be a Peano curve and the function \(\varphi (y)\) from (1) to be a continuous one. Then, the optimization problem for a multidimensional function \(\varphi (y)\) can be reduced to a problem of optimization of a one-dimensional reduced function \(\varphi (y(x))\)

$$\begin{aligned} \varphi (y(x^*)) = \min \{ \varphi (y(x)){:}\,x \in [0, 1] \} = \min \{ \varphi (y){:}\, y \in D \}. \end{aligned}$$

This approach has served a basis for the development of many efficient multidimensional global search algorithms (see [9, 11, 13, 30, 34, 36]).

In the framework of the second approach, the search domain is divided into the subdomains by means of a partition strategy, each subdomain is estimated from the point of view of its importance, or prospectivity for continuation of the search and the next iteration point is chosen in the subdomain of maximal importance. This approach is close to univariate scheme of characteristical algorithms [11] and the methods based on multidimensional partitioning are presented, for instance, in [17, 1924].

Finally, the well-known approach based on the usage of the one-dimensional optimization methods for solving the multidimensional optimization problems consists in the application of the nested optimization scheme. According to this scheme, the solving of a multidimensional optimization problem could be obtained by means of solving a sequence of the recursively nested one-dimensional problems (see, for example [2, 3, 8, 12, 25, 28, 31, 36]). The core of this approach is the relation

$$\begin{aligned} \min _{y \in D} \varphi (y) = \min _{y_1 \in [a_1, b_1]} \dots \min _{y_N \in [a_N, b_N]} \varphi (y_1, \dots , y_N). \end{aligned}$$
(3)

According to (3) the solving of a multidimensional multiextremal optimization problem is reduced to the solving of the one-dimensional problem:

$$\begin{aligned} \varphi ^* = \min _{y \in D} \varphi (y) = \min _{y_1 \in [a_1, b_1]} \widetilde{\varphi }_1(y_1), \end{aligned}$$
(4)

where

$$\begin{aligned}&\widetilde{\varphi }_i(y_i) = \varphi _i(y_1, \dots , y_i) = \min _{y_1 \in [a_{i + 1}, b_{i + 1}]} \varphi _{i + 1}(y_1, \dots , y_i, y_{i + 1}), \quad 1 \le i \le N, \end{aligned}$$
(5)
$$\begin{aligned}&\varphi _N(y_1, \dots , y_N) = \varphi (y_1, \dots , y_N). \end{aligned}$$
(6)

The one-dimensional function \(\widetilde{\varphi }_1(y_1)\) introduced in (4) is constructed according to general recursive rule (5): in order to calculate the value \(\widetilde{\varphi }_1(y_1)\) for given value of the variable \(y_1 = \hat{y}_1\) it is necessary to perform the minimization of the function

$$\begin{aligned} \widetilde{\varphi }_2(y_2) = \varphi _2(\hat{y}_1, y_2). \end{aligned}$$

When performing this optimization, the function \(\widetilde{\varphi }_2\) is a one-dimensional one as well since the value of the variable \(y_1\) is predefined and fixed. Next, in turn, in order to calculate the value \(\widetilde{\varphi }_2(y_2)\) at the point \(y_2 = \hat{y}_2\) it is necessary to perform the minimization of the univariate function

$$\begin{aligned} \widetilde{\varphi }_3(y_3) = \varphi _3(\hat{y}_1, \hat{y}_2, y_3), \end{aligned}$$

etc.

The scheme (4)–(6) in the case of the Lipschitzian one-dimensional functions \(\widetilde{\varphi }_i\) combined with the one-dimensional algorithms [7, 25, 32, 3436] has served as the basis for designing a number of the multidimensional methods [3, 12, 25, 26, 36].

Concluding the characterization of the nested optimization scheme, it should be noted that this approach to the design of the multidimensional methods has also a number of disadvantages. When solving the particular subproblem (5) the information on the behavior of the objective function obtained during solving other subproblems is not used in any way. This is the fundamental cause of the loss in the efficiency of the nested scheme. On the one hand, this makes the computational scheme simpler because allows to avoid storing and processing large amount of information. On the other hand, the refusal from the information worsens the search process planning quality. Besides, when using this scheme, the precision in the termination condition providing the termination of any recursively nested one-dimensional problem like (5) should be defined a priori. If the accuracy appears to be not enough, it is required to repeat the solving of the problem as a whole from the beginning with an increased precision. At the same time, the use of too high precision can result in an essential increase in the global search duration, and breaking the computations before the termination condition is satisfied (because of exceeding of the feasible computation time) in this case can lead to the case when the estimate of the sought solution is obtained in some subdomain of the search domain D only.

The rest of the paper is organized as follows. Section 2 describes a new generalization of the nested optimization scheme. Section 3 is devoted to the theoretical substantiation of the new scheme concerning convergence properties. Section 4 contains results of computational experiments. The last section gives a succinct summary of the paper.

2 Adaptive nested optimization scheme

Let us give a generalization of the basic scheme of dimensionality reduction (3)–(6) allowing to overcome some disadvantages of the approach mentioned above. The essence of this generalization consists in the elimination of the principle of a strict subordination for the minimization problems of the one-dimensional functions \(\widetilde{\varphi }_i(y_i), 1 \le i \le N\), generated within the framework of the nested optimization scheme. In the new approach all these problems are supposed to be solved simultaneously.

The preliminary description of the proposed generalization is as follows.

Let us introduce the following notation for brevity

$$\begin{aligned} \nu _i = (y_1, \dots , y_i), \quad 1 \le i \le N. \end{aligned}$$
(7)

Then, the one-dimensional functions \(\widetilde{\varphi }_i(y_i), 1 < i \le N\), generated within the framework of the nested optimization scheme can be rewrote in the form

$$\begin{aligned} \widetilde{\varphi }_i(y_i) = \varphi _i(\nu _{i - 1}, y_i), \quad 1 < i \le N, \end{aligned}$$
(8)

where the vector \(\nu _{i - 1}\) is a fixed one.

Let us describe in more details the procedure of computations for the initial (basic) scheme of dimensionality reduction.

At every global search iteration \(i, 1 \le i \le k\), the optimization algorithm begins its computations from a one-dimensional function of the first level \(\widetilde{\varphi }_1(y_1)\). The algorithm selects a current search point \(y_1 = \hat{y}_1\) taking into account the search information obtained already. In order to get the function value \(\widetilde{\varphi }_1(y_1)\) at this point, a new one-dimensional function \(\varphi _2(\hat{y}_1, y_2)\) to be minimized is generated. To do so, the algorithm should suspend the minimization of the function \(\widetilde{\varphi }_1(y_1)\) and start minimizing the function \(\varphi _2(\hat{y}_1, y_2)\). In the optimization process of the function \(\varphi _2(\hat{y}_1, y_2)\), the algorithm generates a sequence of the points

$$\begin{aligned} \{ y_2^j, 1 \le j \le k \}, \end{aligned}$$

for each of which it is necessary to execute the minimization of the third level \(\varphi _3(\hat{y}_1, \hat{y}_2, y_3)\) in the same strict order: when determining the next search point before the transition to the next search iteration, the solving of a one-dimensional problem of the next dimensionality reduction level should be carried out first. This process should be fulfilled recursively until the last Nth level. The resulting hierarchical scheme of generating and solving the one-dimensional problems is presented in Fig. 1.

Fig. 1
figure 1

The hierarchical scheme of generating and solving the one-dimensional problems in the nested optimization scheme

The functions generated during the optimization process have a strict hierarchical structure of subordination in the form of a tree. The structure of this tree changes dynamically in the course of solving the multidimensional task. The computation of the function value \(\varphi _i(\hat{\nu }_{i - 1}, y_i)\) of the ith level, \(1 \le i < N\), at some point requires solving all problems in one of the subtrees of the \((i + 1)th\) level. The functions \(\varphi _N(\hat{y}_{N - 1}, y_N)\) of the Nth level are the leaves of the problem tree, their values are the values of the objective function \(\varphi (y), y \in D\), and are computed directly without the nested optimization.

The proposed generalization of the basic scheme of the dimensionality reduction consists in the elimination of the strict order of the subproblem solving according to their hierarchy in the problem tree, when the solving of the problem of the ith level, \(1 \le i < N\), does not require the complete solving of the nested problems of the \((i + 1)th\) level. Within the framework of the novel approach all generated problems are considered simultaneously and are equal in rights. The proposed approach will be hereinafter referred to as the adaptive nested optimization scheme.

For description of the approach, let us renumber problems (5) in the following way. Since the nested optimization scheme starts from the solving of the problem

$$\begin{aligned} \min _{y_1 \in [a_1, b_1]} \varphi _1(y_1), \end{aligned}$$
(9)

let us assign to this problem the index \(l = 1\). The solving of this problem begins with the computation of the function \(\widetilde{\varphi }_1(y_1)\) at some point \(\hat{y}_1\) that generates the problem

$$\begin{aligned} \min _{y_2 \in [a_2, b_2]} \varphi _2(\hat{y}_1, y_2), \end{aligned}$$

to which the index \(l = 2\) is assigned, etc.

In general, when \(Q \le 1\) subproblems (5) have been initiated already, the index \(Q + 1\) is assigned to a newly generated subproblem, and the number of subproblems Q is incremented by 1.

Thus, at every moment of execution of the nested optimization scheme there is a set \(F_Q\) consisting of the problems of the kind

$$\begin{aligned} \min _{x \in [a, b]} f_l(x), \quad 1 \le l \le Q, \end{aligned}$$
(10)

where the index l is connected with a recursion level, i.e., with the coordinate number \(i, 1 \le i \le N\), which this problem belongs to. In this case \(x = y_i\), and the search domain \([a, b] = [a(l), b(l)]\) is the interval \([a_i, b_i]\). When \(i = 1\), \(f_l(x) = \varphi _1(y_1)\). If \(i > 1\), then the index l is also connected with the vector \(\nu _{i - 1} = \nu _{i - 1}(l)\) and \(f_l(x) = \varphi _i(\nu _{i - 1}, y_i)\). Moreover, any problem (10) except the problem \(f_l(x)\) is associated with the “parent” one, i.e., with a problem which has generated it.

Let us designate the operation of the computation of the objective function value at a given point of the search domain as the trial. Consider the methods, which generate during solving the problem (10) the sequence of trial coordinates \(x_l^1, x_l^2, \dots , x_l^k, \dots \) and the sequence of the trial results (the function values) \(z_l^1, z_l^2, \dots , z_l^k, \dots \), where \(z_l^j = f_l(x_l^j), j \ge 1\). After completing \(k \ge 1\) trials the set of the search information is formed

$$\begin{aligned} \omega _l^k = \big \{ (x_l^j, z_l^j, \lambda _l^j){:}\, 1 \le j \le k \big \}, \end{aligned}$$
(11)

where \(\lambda _l^j, 1 \le j \le k\), are the indices of the problems generated for the computation of the values \(z_l^j, 1 \le j \le k\). Within the framework of the multidimensional scheme, it is reasonable to form the search information separately for each problem of the family \(F_Q\) under solving. As a complete set of information for the multidimensional problem being solved one can consider the set

$$\begin{aligned} \varOmega _Q = \big \{ \omega _l^{k(l)}, 1 \le l \le Q \big \}, \end{aligned}$$

where \(k(l), 1 \le l \le Q\), is the number of the search information elements for the problem with the index l. It is important to note that the values \(z_l^j, 1 \le j \le k(l)\), in the search information for the function of the level less than N may vary since these values are the estimates of the minimum values of the functions \(f_l\), and these estimates may be refined during the computations.

Note that in traditional nested optimization scheme, in the course of solving a problem (10) the information on the particular sets (11) is used only while in the adaptive generalization the full set of information (12) is taken into consideration.

Before the detailed description of the general adaptive scheme let us introduce some auxiliary terms. When a univariate subproblem generates for computing the value of its objective function a subproblem of the next level we will call the generating subproblem as parental one, or just a parent, and the generated subproblem as the upper one. Obviously, the subproblem (9) of the first level can be parental only and subproblems of the level N have no upper levels. Let us juxtapose to the subproblem with the index l from the set \(F_Q\) (except the subproblem (9) with the index \(l = 1\)) the number \(\pi (l)\) of its parent and the number \(\rho (l)\) of a recursion level (number of coordinate) which this subproblem belongs to. Moreover, let us consider a point \(x_l^j\) being j-th trial in the course of solving the problem (10) with the number l and connect with it the number \(\lambda _l^j\) of a subproblem generated for computing the function value at this point. Let us introduce as well the designation \(\varphi _l^*\) for the current minimal value of objective function \(f_l\) from (10).

Step 1. Initialization.

  1. 1.1.

    Set the number of subproblems of the nested scheme \(Q = 0\) and the number of coordinate level \(l = 1\). Accept the level number \(i = 1\). Choose a point \(x_1^1\) in the interval \([a_1, b_1]\) as the starting point of solving the univariate subproblem (9). Increase Q by 1 and include this problem in the set \(F_Q\) with the index \(l = 1\).

  2. 1.2.

    For calculating at the level i the value of the objective function at the point \(x_i^1\), pass to the next coordinate (\(i = i + 1\)) for solving the univariate subproblem

    $$\begin{aligned} \min _{y_i \in [a_i, b_i]} \varphi _i(\hat{\nu }_{i - 1}, y_i), \end{aligned}$$
    (12)

    where the components of the vector \(\hat{\nu }_{i - 1} = (\nu _1, \dots , \nu _{i - 1})\) are constants. Namely, \(\nu _j = x_j^1, 1 \le j \le i - 1\).

  3. 1.3.

    Choose a point \(x_i^1\) in the interval \([a_i, b_i]\) as the starting point for solving the subproblem (12). Increase Q by 1 and include this problem in the set \(F_Q\) with the index \(l = i\).

  4. 1.4.

    Repeat operations 1.2 and 1.3 up to reaching the level \(i = N\). At this level calculate the value \(z_N^1\) of objective \(\varphi (y)\) function from (1) at the point \(y = (x_1^1, \dots , x_N^1)\) instead of generating a new univariate subproblem.

  5. 1.5.

    For all subproblems from \(F_Q\) with numbers \(l, 1 \le l \le Q\):

    1. 1.5.1.

      If \(l \le N\), then set \(\lambda _l^1 = l + 1\) (for the \(l = N\) number \(\lambda _l^1\) is undefined).

    2. 1.5.2.

      Accept \(z_l^1 = z_N^1\), \(k(l) = 1\), and form the sets of search information \(\omega _l^{k(l)} = \{( x_l^1, z_l^1, \lambda _l^1 )\}\) (except \(l = N\) where \(\omega _l^{k(l)} = \{( x_l^1, z_l^1 )\}\)).

    3. 1.5.3.

      Specify current minimal value \(\varphi _l^* = z_l^1\), \(\rho (l) = l\), and for \(l > 1\) accept \(\pi (l) = l - 1\).

Step 2. Main procedure.

Let there be Q subproblems (10) for which sets (11) have been formed and the values \(\pi (l)\), \(\rho (l)\), \(\varphi _l^*\) have been determined. The next iteration of the adaptive scheme is executed in accordance with the following rules.

  1. 2.1.

    Choice of univariate subproblem.

    1. 2.1.1.

      Assign to each subproblem (10) with the index \(l, 1 \le l \le Q\), the value \(\widetilde{R}(l)\) called the characteristic of this subproblem.

    2. 2.1.2.

      Find among the subproblems (10) a subproblem with index \(\sigma , 1 \le \sigma \le Q\), which the maximal characteristic corresponds to

      $$\begin{aligned} \widetilde{R}(\sigma ) = \max \big \{ \widetilde{R}(l){:}\, 1 \le l \le Q \big \}. \end{aligned}$$
      (13)
  2. 2.2.

    New iteration.

    1. 2.1.1.

      Set \(l = \sigma , i = \rho (\sigma )\) and determine a point \(x_l^{k(l) + 1}\) of the next trial in the subproblem \(\sigma \) taking into account the search information (11). If \(i < N\), then accept \(\lambda _l^{k(l) + 1} = Q + 1\).

    2. 2.1.2.

      If \(i = N\), then calculate the value \(z_l^{k(l) + 1} = \varphi (\hat{\nu }_{i - 1}, x_l^{k(l) + 1})\) where the vector \(\hat{\nu }_{i - 1}\) has been determined in preceding parental subproblems. Renew the current estimation of minimum \(\varphi _l^*\) and the search information adding the pair \((x_l^{k(l) + 1}, z_l^{k(l) + 1})\) to the set (11) and incrementing k(l) by 1. Then pass to Substep 2.3. In the opposite case (\(i < N\)) go to 2.2.3.

    3. 2.1.3.

      For calculating at the level i the value of the objective function at the point \(x_l^{k(l) + 1}\) take the next coordinate (\(i = i + 1\)) for solving a new univariate subproblem (12) (the components of the vector \(\hat{\nu }_{i - 1}\) are fixed). Increase Q by 1 and assign to the new subproblem the number \(l = Q\) in the set \(F_Q\).

    4. 2.1.4.

      Choose a point \(x_l^1\) in the interval \([a_l, b_l]\) as the starting point for solving the new subproblem (12). Define the set (11) as empty one and assign \(k(l) = 0\). Accept \(\varphi _l^* = +\infty \), \(\rho (l) = i\), \(\lambda _l^1 = Q + 1\).

    5. 2.1.5.

      If the subproblem \(\sigma \) is parental for the current one, then set \(\pi (l) = \sigma \), otherwise, \(\pi (l) = Q - 1\). Go to 2.2.2.

  3. 2.3.

    Renewing the search information in parental subproblems.

    1. 2.3.1.

      Return to the parental subproblem: new \(l = \pi (l)\).

    2. 2.3.2.

      If the set \(\omega _l^{k(l)}\) is empty or \(l = \sigma \), then accept the current minimal value of subproblem with number \(\lambda _l^{k(l)}\) as the objective function value \(z_l^{k(l)}\) at the new trial point \(x_l^{k(l)}\), include in the set \(\omega _l^{k(l)}\) the triplet \(\big ( x_l^{k(l)}, z_l^{k(l)}, \lambda _l^{k(l)} \big )\), add unity to k(l) and renew the estimate \(\varphi _l^*\).

    3. 2.3.3.

      Repeat operations 2.3.1 and 2.3.2, if \(l \ne \sigma \). In the opposite case, go to 2.3.4.

    4. 2.3.4.

      Accept \(\mu = \pi (l)\) and pass to the parental subproblem: new \(l = \pi (l)\).

    5. 2.3.5.

      Find in the set \(\omega _l^{k(l)}\) a triplet \(\big ( x_l^j, z_l^j, \lambda _l^j \big ), 1 \le j \le k(l)\), for which \(\lambda _l^j = \mu \) and replace \(z_l^j\) with \(\varphi _{\mu }^*\).

    6. 2.3.6.

      If \(\varphi _{\mu }^* < \varphi _l^*\), then replace \(\varphi _l^*\) with \(\varphi _{\mu }^*\) and go to 2.3.4. In the opposite case, complete the current iteration of the main procedure and return to Substep 2.1.

Description of the adaptive nested optimization scheme has been completed.

In the algorithmic scheme presented above the decision rules for choice of starting points in univariate subproblems and for selection the best subproblem (characteristics \(\widetilde{R}(l)\) in 2.2.1) are not defined. Generally speaking, this characteristic \(\widetilde{R}(l)\) is determined by a one-dimensional method that are used for solving the subproblem l.

In the present research, the global search algorithm developed within the framework of the information approach (see [3436]) has been applied for solving the one-dimensional problems (10). Let us show how this method forms the subproblem characteristic \(\widetilde{R}(l)\). According to this algorithm, while solving a univariate problem (10) the first trial is executed at some internal point of the interval [ab]. For certainty, the starting point \(x_l^1\) is considered to be the middle of the interval [ab] where the value \(z_l^1 = f_l(x_l^1)\) is computed.

The choice of the point \(x_l^{k + 1}, k > 1\) of any subsequent \((k + 1)\)-th trial is determined by the following operations:

Step 1. :

Coordinates of the previous trials \(x_l^1, x_l^2, \dots , x_l^k\) from (11) and the ends of the interval [ab] are renumbered by subscripts in increasing order:

$$\begin{aligned} a = x_0 < x_1 < \dots < x_k < x_{k + 1} = b, \end{aligned}$$
(14)

and the values \(z_j = f_l(x_j), 1 \le j \le k\), are juxtaposed to them (besides \(x_0\) and \(x_{k + 1}\)). Values \(z_j, 1 \le j \le k\), are taken from the set (11) because they are the corresponding outcomes \(z_j, 1 \le j \le k\).

Step 2. :

For each interval \((x_{j - 1}, x_j), 1 \le j \le k + 1, k = k(l)\), a value R(j) hereinafter called the characteristic of the interval is calculated as follows

$$\begin{aligned} R(1)= & {} 2m(x_1 - x_0) - 4z_1, \quad j = 1,\nonumber \\ R(j)= & {} m(x_j - x_{j - 1}) + \frac{(z_j - z_{j - 1})^2}{m(x_j - x_{j - 1})} - 2(z_j + z_{j - 1}), \quad 1 < j < k + 1,\nonumber \\ R(k + 1)= & {} 2m(x_{k + 1} - x_k) - 4z_k, \quad j = k + 1 \end{aligned}$$
(15)
Step 3. :

The interval \((x_{t - 1}, x_t)\) with the index \(t = t(l)\), which the maximal characteristic R(t) corresponds to is selected, i.e.,

$$\begin{aligned} R(t) = \max \big \{ R(j){:}\,1 \le j \le k + 1 \big \}, \end{aligned}$$
(16)

and the value R(t) is accepted as the characteristic \(\widetilde{R}(l)\) of the subproblem l.

Step 4. :

The next trial points in the interval with the maximal characteristic is calculated as

$$\begin{aligned} x^{k + 1}= & {} \frac{x_t + x_{t - 1}}{2} - \frac{z_t - z_{t - 1}}{2m}, \quad 1 < t < k + 1, \nonumber \\ x^{k + 1}= & {} \frac{x_t + x_{t - 1}}{2}, \quad t = 1, t = k + 1. \end{aligned}$$
(17)

Here the value m can be considered as an estimate of Lipschitz constant of the minimized function. It can be calculated adaptively on the base of the trial results as

$$\begin{aligned} m = {\left\{ \begin{array}{ll} rM, &{} \quad M > 0, \\ 1, &{} \quad M = 0 \end{array}\right. } \end{aligned}$$
(18)

where

$$\begin{aligned} M = \max _{1 < i \le k} \frac{|z_i - z_{i - 1}|}{x_i - x_{i - 1}}, \end{aligned}$$

and \(r > 1\) is a parameter of the method. The algorithm can be supplemented with a termination criterion:

Step 5. :

If

$$\begin{aligned} x_t - x_{t - 1} \le \varepsilon , \end{aligned}$$
(19)

where t from (16) is the index of the interval with the maximal characteristic, then stop. In the opposite case, it is necessary to calculate \(z_l^{k + 1} = f_l(x_l^{k + 1})\), to increment k by 1 and to pass to Step 1. Here \(\varepsilon > 0\) is the predefined search accuracy (for the coordinates).

The univariate method describes above belongs to the class of the characteristical global search algorithm (see [11, 36]). This class includes many well known one-dimensional multiextremal optimization methods, such as, for example, algorithms [18, 25, 26, 32, 36] which can be embedded into the adaptive nested scheme as well.

Using the rule (19) one can introduce a termination condition into the adaptive nested scheme. It can be either a “strong” criterion when (19) is satisfied for all problems from the set \(F_Q\) or a weaker one when the inequality (19) holds for the problem (9) only.

3 Convergence of the adaptive nested optimization scheme

In order to prove the convergence of the proposed approach, let us note first that in the novel generalized multistep scheme the current estimate of the minimum value of the function \(\widetilde{\varphi }_{i + 1}(y_{i + 1})\) determined on the base of the completed global search iterations is used as the sought function value \(\widetilde{\varphi }_i(y_i)\). The estimates used can be refined during the computations. However, the estimates may differ from the values obtained by the exact solving of problems (4) essentially. More strictly, one can say that in the global search the computations deal not with the “exact” one-dimensional functions \(\widetilde{\varphi }_i(y_i), 1 \le i < N\), but with the approximations \(\widetilde{\psi }_i(y_i), 1 \le i < N\), of these ones. Therefore, after completing the optimization, some estimate \(\psi ^*\) of the exact solution \(\varphi ^*\) from (4) will be obtained. In this connection, an important question on the nearness of the estimates \(\psi ^*\) and \(\varphi ^*\) arises. It is worth noting that the approximate computations of the one-dimensional function values are inherent in the basic nested optimization scheme as well (see [34, 35]). For this scheme, the following theorem is true:

Theorem 1

(see [34]) Let the functions \(\varphi _i(\nu _i)\) be the Lipschitzian ones with the constants \(L_i\) for any i, \(1 \le i \le N\). Then, the following assertion is true for the approximation \(\psi ^*\) obtained in the minimization of the function \(\varphi (y), y \in D\), by the information algorithm using the basic scheme of nested optimization:

$$\begin{aligned} \lim _{\varepsilon _i \rightarrow 0} \psi ^* = \varphi ^*, \end{aligned}$$

if

$$\begin{aligned} \varepsilon _{i + 1} = O(\varepsilon _i), \quad 1 \le i < N, \end{aligned}$$

and besides, the inequality

$$\begin{aligned} m_i \ge 2L_i + \theta , \quad 1 \le i < N, \theta > 0, \end{aligned}$$
(20)

holds after terminating the minimization for coordinate \(y_i\) (i.e., after solving any problem (10) with \(l = l(i)\)).

Here \(m_i\) are estimates (18) applied in the solving of subproblems (10) for the corresponding coordinates \(y_i\), \(\varepsilon _i\) are the accuracies for the solutions of these problems in termination criterion (19), and \(O(\varepsilon _i)\) are the infinitesimal in comparison with \(\varepsilon _i\).

Theorem 1 is a theoretical substantiation of the applicability of the basic nested optimization scheme. At the same time, it could be considered as the justification of the convergence of the adaptive scheme as well if a “strong” termination condition is applied, i.e., when (19) is satisfied for all one-dimensional subproblems. In this case, the order of computations is changed in the realization of the scheme only.

In the case of using the adaptive scheme with the “weak” termination condition the applicability of the theorem is limited. The one-dimensional problems \(f_l(x), 1 \le l \le Q,\) of the family \(F_Q\) from (10) are considered simultaneously. A part of these ones [with large enough values of the minimized function \(\varphi (y)\)] may be never solved with the required accuracy \(\varepsilon \). This is the initial idea for improvement efficiency of the adaptive scheme. As a result, the error of the calculations of the functions \(\varphi _i(\nu _i)\) may appear to be considerable. This error may remain unchanged even when \(\varepsilon \) tends to zero. In order to resolve this problem, one can note that actually there is no need in the exact calculation of the functions \(\varphi _i(\nu _i)\) in all points of the feasible domain. The necessary requirement is the condition of the exact calculation of these functions in the nearness of the global minimum of the initial function \(\varphi (y)\) only. Just because this requirement will be used further for the theoretical substantiation of the adaptive nested optimization scheme.

Let us represent the search information \(\omega _l^{k(l)}, 1 \le l \le Q\), from (11)–(12) obtained during the global search for the functions \(f_l(x) = \varphi _i(\nu _i, x), 1 \le l \le Q\), where \(x = y_i, i = i(l)\), in the form

$$\begin{aligned} \omega _l^{k(l)} = \{ (y_{l, j}, z_{l, j}(\delta _{l, j}), \lambda _{l, j}){:}\, 1 \le j \le k(l) \}, \end{aligned}$$
(21)

where \(y_{l, j}, 1 \le j \le k(l)\), are the points of the completed l search trials, the index l corresponds to the index of the problem in the family \(F_Q\), the index j corresponds to the increasing order of the iteration points (14) in the search information, \(\delta _{l, k}, 1 \le j \le k(l)\), are the errors of the computation of the values \(z_{l, j}, 1 \le j \le k(l)\), i.e.,

$$\begin{aligned} (z_{l, j} - \varphi _{\lambda _{l, j}}(\nu _{\lambda _{l, j}}, y_{l, j})) \le \delta _{l, j}. \end{aligned}$$
(22)

Remind [see (9)] that \(\lambda _{l, j}, 1 \le j \le k(l)\), are the indices of the problems from the family \(F_Q\), which were generated to calculate the values \(z_{l, j}, 1 \le j \le k(l)\). It is important to emphasize also that the errors \(\delta _{l, j}, 1 \le j \le k(l)\), could appear to be different for different points \(z_{l, j}, 1 \le j \le k(l)\).

The possibility to obtain the estimates of the global minimum of the initial function \(\varphi (y), y \in D\), is defined by the conditions of the following theorem.

Theorem 2

Let the adaptive scheme using the information global search algorithm (14)–(18) and the “weak” termination condition (19) for problem (9) with \(\varepsilon = 0\) be applied to solving problem (1). If

  1. i)

    the objective function \(\varphi (y)\) of problem (1) satisfies the Lipschitz condition with a finite constant \(L > 0\) in domain (2);

  2. ii)

    starting from some search step, the inequality

    $$\begin{aligned} m > 2L, \end{aligned}$$
    (23)

    for the estimates m of Lipschitz constant from (18) holds;

  3. iii)

    for any function \(f_l(x) = \varphi _i(\nu _i, x), 1 \le l \le Q, i = i(l)\), of the family \(F_Q\) from (10) the following conditions for the errors of the function values’ calculation hold

    $$\begin{aligned}&\delta _{l, \tau } + \delta _{l, \tau - 1} \le 2\varphi (y^*) + L(y_{l, \tau } - y_{l, \tau - 1}) \nonumber \\&\quad - (\varphi _{\lambda _{l, \tau }}(\nu _i, y_{l, \tau }) + \varphi _{\lambda _{l, \tau - 1}}(\nu _i, y_{l, t - 1})), \quad 1 < \tau \le k(l), \end{aligned}$$
    (24)
    $$\begin{aligned}&\delta _{l, \tau } \le \varphi (y^*) + L(y_{l, \tau } - y_{l, \tau - 1}) - \varphi _{\lambda _{l, \tau }}(\nu _i, y_{l, \tau }), \quad \tau = 1, \end{aligned}$$
    (25)
    $$\begin{aligned}&\delta _{l, \tau - 1} \le \varphi (y^*) + L(y_{l, \tau } - y_{l, \tau - 1}) - \varphi _{\lambda _{l, \tau - 1}}(\nu _i, y_{l, \tau - 1}), \quad \tau = k + 1, \end{aligned}$$
    (26)

    where \(\tau , 1 \le \tau \le k(l) + 1\), is the index of the interval (determined in accordance with the search information \(\omega _l^{k(l)}\)) containing the value of the ith coordinate \(y_i^*\) of the global minimum point, \(y^*\) i.e.,

    $$\begin{aligned} y_{l, \tau - 1} \le y_i^* \le y_{l, \tau }. \end{aligned}$$
    (27)

Then, the global minimizer \(y^*\) of the function \(\varphi (y)\) in the domain D is the limit (accumulation) point of the search trial sequence generated by the adaptive scheme.

Proof

Since in termination condition (19) \(\varepsilon = 0\), the algorithm of the adaptive scheme will generate an infinite sequence of trials \(\{y^s\}\). Assume some global minimum point \(y^*\) in problem (1) not to be the limit one of this sequence, i.e., there is no subsequence of the sequence \(\{y^s\}\) converging to \(y^*\). Because D is a bounded and closed set, a subsequence converging to a limit point \(\bar{y} \in D\) can be always selected from the sequence \(\{y^s\}\) and \(\varphi (\bar{y}) \ge \varphi (y^*)\).

As it follows [36] from the property of the bilateral convergence of the information algorithms for the characteristics \(R(\cdot )\) from (15), for the intervals containing the values of the coordinates of the point \(\bar{y}\) holds

$$\begin{aligned} \lim _{q \rightarrow 0} R(t(q)) = -4\varphi (\bar{y}), \end{aligned}$$
(28)

where t is the index of the interval containing the values of the coordinates of the point \(\bar{y}\), and q is the index of the global search iteration. Let us estimate the characteristic \(R(\tau ), \tau = \tau (q)\), of the interval (27).

If \(1 \le \tau \le k(l)\), then according to (22), (24)

$$\begin{aligned}&z_{l, \tau - 1} + z_{l, \tau } \le \varphi _{\lambda _{l, \tau - 1}}(\nu _i, y_{l, \tau - 1}) + \varphi _{\lambda _{l, \tau }}(\nu _i, y_{l, \tau }) + \delta _{l, \tau - 1} + \delta {l, \tau } \\&\quad \le 2\varphi (y^*) + L(y_{l, \tau } - y_{l, \tau - 1}) \end{aligned}$$

However,

$$\begin{aligned}&R(\tau ) = m(y_{l, \tau } - y_{l, \tau - 1}) + \frac{(z_{l, \tau } - z_{l, \tau - 1})^2}{m(y_{l, \tau } - y_{l, \tau - 1})} - 2(z_{l, \tau } + z_{l, \tau - 1}) \\&\quad \ge m(y_{l, \tau } - y_{l, \tau - 1}) - 2L(y_{l, \tau } - y_{l, \tau - 1}) - 4\varphi (y^*), \end{aligned}$$

therefore because of (23)

$$\begin{aligned} R(\tau ) > -4\varphi (y^*) \end{aligned}$$
(29)

If \(\tau = 1\) then taking into account (22), (25)

$$\begin{aligned} z_{l, \tau } \le \varphi _{\lambda _{l, \tau }}(\nu _i, y_{l, \tau }) + \delta _{l, \tau } \le \varphi (y^*) + L(y_{l, \tau } - y_{l, \tau - 1}), \end{aligned}$$

the estimate

$$\begin{aligned} R(\tau ) \ge 2m(y_{l, \tau } - y_{l, \tau - 1}) - 4L(y_{l, \tau } - y_{l, \tau - 1}) - 4\varphi (y^*) \end{aligned}$$

is true and because of (23) inequality (29) holds again. The analogous estimates leading to the truth of (29) are valid in the case \(\tau = k + 1\) as well.

Thus, taking into account that \(\varphi (y^*) \le \varphi (\bar{y})\) the expressions (28) and (29) for sufficiently great s contradict the assumption that the point \(\bar{y}\) is a limit one since the characteristics of the intervals containing this point starting from some iteration of the global search become less than the ones of the intervals with the global minimum point. According to rules (13), (16) such intervals cannot be selected to place the trials inside these ones. \(\square \)

The proof has been completed.

Computational experiments on test classes of multidimensional multiextremal problems presented in the next section confirm the theoretical properties of the methods considered.

4 Numerical experiments

First, let us compare the character of the trial placement by the basic variant of the nested optimization scheme and by the adaptive one for a function from the essentially multiextremal test class (see [10, 11]). The functions of this class have the form

$$\begin{aligned} \varphi (y_1, y_2)= & {} -\Bigg \{ \bigg ( \sum _{i = 1}^{7}\sum _{j = 1}^{7} \Big [ A_{i j}a_{i j}(y_1, y_2) + B_{i j}b_{i j}(y_1, y_2) \Big ] \bigg )^2 \nonumber \\&+ \bigg ( \sum _{i = 1}^{7}\sum _{j = 1}^{7} \Big [ C_{i j}a_{i j}(y_1, y_2) + D_{i j}b_{i j}(y_1, y_2) \Big ] \bigg )^2 \Bigg \}^\frac{1}{2} \end{aligned}$$
(30)

where \(a_{i j}(y_1, y_2) = \sin (\pi i y_1)\sin (\pi j y_2)\) and \(b_{i j}(y_1, y_2) = \cos (\pi i y_1)\cos (\pi j y_2)\) and are considered in the domain \(y_1, y_2 \in [0, 1]\). The parameters

$$\begin{aligned} A_{i j}, B_{i, j}, C_{i, j}, D_{i, j} \in [-1, 1], \end{aligned}$$

are the independent random numbers, distributed uniformly over the interval specified above.

In Fig. 2, the level curves of a function from the class (30) as well as the distributions of the trials (marked by dark dots) in the course of optimization executed by the basic and adaptive schemes are plotted. The numerical experiment has been carried out for both variants with the parameter of the method \(r =3\) in the estimate (18) and the search accuracy in termination condition (19) \(\varepsilon = 0.01\). Note that according to the basic scheme, condition (18) should be satisfied for each one-dimensional subproblem while in the adaptive generalization for subproblem (9) only.

Both adaptive and basic schemes have provided the required accuracy of the problem solution. However, the adaptive variant has carried out 257 search trials while the basic scheme has computed 1139 objective function values.

Fig. 2
figure 2

The placement of the trials by the adaptive scheme (the left panel) and by the basic one (the right panel)

The comparison of the efficiencies of the optimization algorithms using the examples of solving particular test problems, obviously, is not a sufficiently convincing way for proving the advantages of one or another method. Carrying out some multiple experiments in order to obtain the statistically reliable results of the comparison is a more objective approach. For this goal one can use the method of operating characteristics introduced in [10, 36].

The operating characteristic is a set of pairs:

$$\begin{aligned} \{ \langle k, p(k) \rangle \}, \end{aligned}$$

where k is the number of search iterations and p(k) is the number of the problems from the test class solved successfully in k iterations. An individual pair corresponds to a particular set of the parameters of the method, which the test problems were solved by. Such indicators can be calculated according to the results of numerical experiments for various parameters of the methods and presented as a plot in the (kp) axes. If for a given value k the operating characteristic of a method is situated above the characteristic of another method, the first method provides a better reliability, i.e., the possibility of the proper solution of the problem at the same computational costs for the search. If for a given value p(k) the operating characteristic of a method is located on the left of the characteristic of another method, the first method requires less computational expenditures to achieve the same reliability. Thus, the operating characteristics allow comparing the efficiencies of various methods visually.

Let us first consider the class of two-dimensional functions (30) consisting of 100 functions. For a comparative experiment four global optimization methods have been taken. These are the basic nested optimization scheme (BS), the adaptive one (AS) and two methods based on other approaches to solving the global optimization problems: the partitioning method DIRECT (see [17]) and the algorithm applying the dimensionality reduction by means of Peano mappings (PMA) from [36]. In BS and AS the reliability parameter value \(r = 2.6\) was used, for PMA \(r = 3.2\) (these values provide sufficient conditions of convergence to global minimum for these methods). The operating characteristics have been obtained by varying the accuracy in the termination condition (19) for AS, BS and PMA and the number of trials as the termination rule in DIRECT. The problem was considered to have been solved successfully if the coordinates of the minimal value found in the course of optimization are located in the circular neighborhood of the real minimum with the radius 0.01. The results of the experiment are presented in Table 1 and Fig. 3. The number of trials is plotted on the abscissa axis in the logarithmic scale.

Table 1 The number of the problems solved successfully in dependence on trials spent
Fig. 3
figure 3

Operating characteristics on the class (30)

The numerical experiments have demonstrated an appreciable advantage of the adaptive scheme over the basic prototype and better performance of the adaptive scheme in providing a better reliability in comparison with the method DIRECT and the algorithm based on Peano curves.

The second series of the numerical experiments has been carried out using a well-known class of test functions GKLS from [6] for the 5-dimensional problems. The testing has been realized for a sample set of 100 GKLS functions belonging to the most complicated GKLS class among the test classes described in [6].

Figure 4 presents the operating characteristics for the basic nested optimization scheme (BS), the adaptive scheme (AS) and the Peano-mapping-based algorithm (PMA). All the methods used the value of parameter \(r = 5\). The choice of the value \(r = 5\) was caused by the necessity to satisfy the sufficient conditions of convergence for PMA. For this method, the inequality \(m > 4L\) is the sufficient condition for the global convergence from [32]. The number of trials is plotted on the abscissa axis in the logarithmic scale.

Fig. 4
figure 4

Operating characteristics using GKLS class

As it follows from Fig. 4, performance of the method BS is significantly worth than performance of the algorithms AS and PMA. The explanation of this fact consists in the phenomenon of the information loss by BS in the course of optimization as it was discussed above. Concerning the comparison of AS and PMA, one can note that for small number of iterations (and, as a result, for low reliability) the method using Peano curves has some advantage. However, for achievement of a high reliability AS is more efficient than BS because it spends less number of trials.

5 Conclusions

The work has been done in the framework of global optimization approach, according to which in order to construct the efficient global search algorithms, various dimensionality reduction schemes are applied. The reduction of the multidimensional problems to the one-dimensional ones allows using a wide choice of the one-dimensional algorithms to solve the multidimensional optimization problems.

In the present article, the nested optimization scheme of dimensionality reduction is considered. Along with a relative simplicity in its basic form, this scheme possesses a number of disadvantages including the redundancy of the number of computations of the minimized function values in the subdomains of the search domain. The authors have proposed a novel method to increase the efficiency of the initial scheme by means of using the full information on the problem in the course of global search. The novel adaptive scheme has been described in the generalized characteristic form, within the framework of which many one-dimensional global optimization algorithms can be employed. The generalized form has been specified for the case of the information global search method, which the proving of the convergence conditions has been done for. In order to confirm the efficiency of the developed approach, the results of the numerical experiments are presented justifying an essential acceleration of the search for the adaptive generalization as compared to the basic prototype and demonstrating the efficiency of the adaptive scheme as compared to the global optimization method based on application of Peano space-filling curves and to the popular method DIRECT.

It is worth noting that because of using the characteristic decision rule for selection of the coordinates for the next iteration, the adaptive scheme has a significant potential for the parallelization that may be a promising direction for its further development.