Keywords

1 Introduction

Global optimization problems aimed at finding the global optimum of multiextremal functions are complicated decision making models and describe many important applications in engineering, economy, scientific researches, etc. (see some examples in [3, 9, 13, 26, 30, 32, 36, 43]). The complexity of these problems depends crucially on the dimension (number of model parameters) because in general case the growth of the computational costs measured, for example, in number of objective function evaluations is exponential when increasing the dimension. There exist several approaches to analyzing global optimization problems oriented at different classes of multiextremal functions defined by their specific properties. The wide spectrum of directions in the field of global optimization can be found in the fundamental monographs [23, 31, 32, 35, 39, 44].

Among the approaches generating efficient algorithms to solving multiextremal optimization problems with objective functions satisfying the Lipschitz condition one can mention the approach based on different partition schemes (component approach) and the class of methods which apply the ideas of reducing multidimensional problems to one or a family of univariate subproblems for solving those by means of well-developed one-dimensional optimization algorithms.

In the framework of the component approach the search region is partitioned into several subregions (components), every component are evaluated numerically for the purpose of its efficiency for search continuation, and after that a new iteration is carried out in the most “perspective’ subregion. The first class of component methods called characteristical ones was proposed and theoretically investigated in the work [18], and later it was generalized to multidimensional case by many researchers (see, for example, publications [24, 25, 27, 31, 32, 35]).

As for the approach transforming a multidimensional problem to the univariate case, it includes two different schemes. The first one is based on applying the Peano space-filling curves which are continuous mappings of a multidimensional hypercube onto the unit interval of the real axis [4, 14, 22, 28, 29, 34, 39]. The second scheme reduces a multidimensional problem to a family of univariate subproblems connected recursively (nested optimization) [5, 6, 11, 12, 16, 37,38,39]. These schemes can be combined when inside the recursive procedure the subproblems of the less dimensionality are considered and solved by means of Peano mappings [40]. As it has been shown in [19], among algorithms of this type the adaptive scheme of nested optimization has demonstrated the best efficiency.

A promising way to overcome the complexity of the multiextremal optimization problems consists in parallelizing sequential schemes of optimization algorithms. Following this idea, some optimization methods have been proposed (see [2, 8, 15, 17, 20, 33, 39, 40]). In this paradigm the usual way consists in performing parallel trials (computations of objective function values) [8, 17, 20, 21, 39, 40]. The algorithm [2] using multiple Peano mappings performs parallel computations of trial couples corresponding to several Peano evolvents. Very interesting approach is used in parallel branch and bound algorithms which build a hierarchical structure of feasible domain partitions and parallelize the procedure of partitioning. For example, the paper [21] describes a model using threads within one computational node and the publication [1] suggests a parallel strategy of partitioning in distributed memory.

As opposed to above approaches the methods on the base of nested optimization scheme [15, 33] implement parallelization by means of parallel performance of internal subtasks. In this paper we consider a parallel algorithm being a generalization of the adaptive scheme of global optimization [11] which belongs to the type of recursive reduction techniques and applies for solving the nested univariate subproblems the information characteristical method [33, 39]. The main goal of the work is to describe a new model of parallel computations inside the adaptive scheme realizing “parallelization by subtask” approach and to estimate the effectiveness of parallelizing measured as speedup of the parallel adaptive scheme compared to the sequential one.

The rest of the paper is organized as follows. Section 2 contains the statement of multiextremal optimization problem to be studied and the general algorithm of the nested optimization scheme. Section 3 describes the model of parallelism organization in the framework of the nested adaptive dimensionality reduction. Section 4 presents results of numerical experiments and speedup estimations of the parallel adaptive scheme. The last section concludes the paper.

2 Nested Optimization Scheme

The statement of the optimization problem to be considered is as follows. It is necessary to find in a hyperparallelepiped H of the N-dimensional Euclidean space \(\mathbb {R}^N\) the least value (global minimum) \(F_*\) of an objective function F(u) and the coordinate \(u_* \in H\) of the global minimum (global minimizer). This problem can be written in a symbolical form as

$$\begin{aligned} F(u) \rightarrow \min , \; u = (u_1, \dots ,u_N) \in H \subseteq \mathbb {R}^N,\end{aligned}$$
(1)
$$\begin{aligned} H = \{ u \in \mathbb {R}^N : a_i \le u_i \le b_i, 1 \le i \le N \}, \end{aligned}$$
(2)

The objective function F(u) is supposed to satisfy in the search domain H the Lipschitz condition

$$\begin{aligned} | F(u') - F(u'') | \le L\Vert u' - u''\Vert , \; u', u'' \in H, \end{aligned}$$
(3)

where \(L > 0\) is a finite value called Lipschitz constant and \(\Vert \cdot \Vert \) denotes the Euclidean norm in \(\mathbb {R}^N\). Under condition (3) the problem (1)–(2) is, in general case, multiextremal and non-smooth.

The nested scheme of dimensionality reduction served as the source for different global optimization methods [5, 6, 11, 12, 16, 37,38,39]. It is based on the known relation [5, 39]

$$\begin{aligned} \min _{u \in H} F(y) = \min _{u_1 \in H_1} \min _{u_2 \in H_2} \cdots \min _{u_N \in H_N} F(u_1, \dots , u_N), \end{aligned}$$
(4)

where \(H_i\) is a line segment \([ a_i, b_i ]\), \(1 \le i \le N\).

Let us give the general description of the nested scheme introducing recursively a family of reduced function \(F^i(\tau _i)\), \(\tau _i = (u_1, \dots , u_i)\), \(1 \le i \le N\), in the following manner.

$$\begin{aligned} F^N(\tau _N) \equiv F^N(u) \equiv F(u), \end{aligned}$$
(5)
$$\begin{aligned} F^{i-1}(\tau _{i-1}) = \min _{u_i \in H_i} F^i(\tau _i), \; 2 \le i \le N. \end{aligned}$$
(6)

Then, instead of minimizing in (1) the N-dimensional function F(u) we can search for the global minimum of the univariate function \(F^1(u_1)\) as, in accordance with (4),

$$\begin{aligned} F_* = \min _{u_1 \in H_1} F^1(u_1). \end{aligned}$$
(7)

However, any numerical optimization method in the course of solving the problem (7) has to calculate values of the function \(F^1(t_1)\). But such a computation at a point \(\tilde{t}_1\) requires solving the problem

$$\begin{aligned} F^2(\tilde{t}_1, t_2) \rightarrow \min , \; t_2 \in H_2, \end{aligned}$$
(8)

which are one-dimensional again as the argument \(\tilde{t}_1\) is fixed, and so on. Following this way, we reach the level N, where the problem

$$\begin{aligned} F^N(\tilde{\tau }_{N-1}, t_N) \rightarrow \min , \; t_N \in H_N, \end{aligned}$$
(9)

is one-dimensional as well because the vector \(\tilde{\tau }_{N-1} = (\tilde{t}_1, \dots , \tilde{t}_{N-1})\) is fixed (its coordinates are given at previous levels of recursion). As \(F^N(t) \equiv F(t)\) then evaluation of objective function values in the problem (9) consists in calculation of the values \(F(\tilde{\tau }_{N-1}, t_N)\) of the given function from (1).

The procedure (7)–(9) described above is recursive and enables to find the solution of the multidimensional problem (1)–(2) via solving the family

$$\begin{aligned} F^i(\tau _{i-1}, u_i) \rightarrow \min , \; u_i \in H_i, 1 \le i \le N, \end{aligned}$$
(10)

of univariate subproblems. Such the scheme is called the nested scheme of dimensionality reduction.

The recursive structure of generation of the subproblems in the family (10) can be presented as a tree of connections between generating (parental) and generated (child) subtasks (see Fig. 1).

Fig. 1.
figure 1

Tree of subtasks in the nested optimization scheme for dimension 3.

In this tree the problem (7) is the root one and the problems (9) are leaves of the tree. Of course, the tree is built in dynamics, and Fig. 1 shows the full tree obtained after completing multidimensional optimization. It should be noted that conducting one trial (computation of objective function value at a point) in one-dimensional subproblem of minimization of \(F^i(\tau _{i-1}, u_i)\), \(1 \le i \le N-1\), generates a subtree in the tree of subtasks. As a consequence, any subproblem (10) is parental for subproblems in subtrees generated by its trials.

In classical implementation of the nested scheme the subproblems (10) are solved until a stopping rule of applied univariate method holds true for all of them. It means that in the course of optimization only subproblems which belong to a sole path from the root to a leaf can interact inter se. It leads to loss of search information obtained in the course of optimization and worsens the efficiency of classical scheme significantly.

In order to overcome this drawback of the classical nested scheme, its improved version called adaptive nested scheme of dimensionality reduction has been proposed in the paper [11]. As opposed to the classical nested scheme, the adaptive extension considers all the currently existing subproblems (10) simultaneously. A numerical value of “significance” called characteristic is assigned to each subproblem of the current family and all the subproblems are decreasingly ordered according to their characteristics. Then, the subproblem with the maximal characteristic is chosen, and in the best subproblem a new iteration of the univariate method connected with this subproblem is executed. The detailed algorithm of the sequential adaptive scheme has been described in [11].

3 Parallel Adaptive Scheme

A natural way of parallelizing the adaptive scheme consists in solving several subproblems in parallel. Let us suppose that at our disposal there are \(P > 1\) parallel computational nodes (processors). Then a parallel iteration of the adaptive scheme could be organized as follows. All the subproblems (subtasks) are ordered according to their characteristics, P subproblems with maximal characteristics are chosen, are distributed among processors (one subproblem to one computational node) and are solved in parallel. Solving within parallel iteration one subproblem of a recursion level l means the decision rule implementation of univariate optimization algorithm used in this subproblem, i.e., the choice of a point \(u_l\) of new trial and computation of the objective function value at this point. If \(l < N\), such the computation generates a subtree of new subtasks that will be added to existing ones after completing the trial at the point \(u_l\). Hereinafter the operation of executing a trial in a subproblem will be denoted as ExecuteIteration.

If the next parallel iteration will start after completing the work of all processors this procedure corresponds to the synchronous case. However, in such organization of parallelism a processor completing computations is obliged to wait until the other processors finish and will stand idle. To avoid this drawback one can to use more effective, but more complicated asynchronous organization of parallelism when a processor completing its work take the best subtask from the pool of non- distributed subproblems.

Further we consider more detailed how both synchronous and asynchronous parallelisms can be organized for the nested adaptive scheme. As a detailed code of the parallel implementation is very large we will give a general algorithmic description of parallel adaptive scheme on the base of an abstract one-dimensional optimization method. For this formal description it is necessary to introduce several notions and designations.

Let at a stage of the adaptive scheme implementation all subproblems renumber with integer numbers from 1 to \(\lambda \), where \(\lambda \) is the number of subtasks (10) generated already and the root subproblem (7) is the first one. A univariate method in the course of minimizing a subproblem \(F^l(\tau _{l-1}, u_l)\) generates a sequence of trial points \(u^1_l, \dots , u^k_l\) at which the values \(z^1_l, \dots , z^k_l\) are computed where \(z^i_l = F^l(\tau _{l-1}, u^i_l), 1 \le i \le k\). These points and values form the set of pairs

$$\begin{aligned} \omega _k = \{ (u^1_l, z^1_l), \dots , (u^k_l, z^k_l) \}, \end{aligned}$$
(11)

that can be interpreted as the current state of the search for this subproblem. It should be remembered that any computation of value \(z^i_l\) requires solving a one-dimensional subproblem at the next \((l+1)\)-th level and, if \(l < N\), building a subtree of subproblems (10). Uniting the subtrees of all trials we get the subtree generated by the subproblem on the whole.

Taking these circumstances into account, we identify a subtask \(t \in \{ 1, \dots , \lambda \}\) as a tuple

$$\begin{aligned} t = \langle l, \tau _{l-1}, k, \omega _k, h, t^p, T^c, W \rangle . \end{aligned}$$
(12)

Here l is the number of the recursion level, which the subproblem belongs to, \(\tau _{l-1}\) is a vector of fixed coordinates obtained from preceding levels, k is the number of trials executed by the univariate algorithm, \(\omega _k\) from (11). The indicator \(h = h(\omega _k)\) shows whether solving this subproblem has been completed, namely, if \(h = 0\) then the algorithm solving the described subproblem terminates its execution, if \(h = 1\) the optimization has to be continued. The number \(t^p\) corresponds to the parental subproblem having generated the current one, and, finally, \(T^c\) presents the set of all subtasks (up to the level N) generated by the subproblem considered and, finally, W is a numerical characteristic of the subproblem significance. The set of all subtasks t, \(t \in \{ 1, \dots , \lambda \}\), we will denote as T.

It should be noted that tuple (12) is not applicable to the root subtask (7) because it has no parents. In order to include the root subproblem into the unified description let us introduce as a parent of the root an “empty” subtask \(t^0\) and define \(t_0 = \varnothing \).

For starting the adaptive scheme (both sequential and parallel) it is necessary to create an initial set T. It could be done applying the classical nested scheme with a few trials in one dimensional search. We will consider just a general procedure Initialize implementing this initial stage without its concretization. It is executed only once and it is not important whether it is sequential or parallel.

As for parallel implementation of the main body of the adaptive scheme we will deal with a computational system with distributed memory. The system is supposed to consist of P computational nodes. Each node has just one processor and memory, to which the processor of the node only has the direct access. The remote direct access to this memory (RDMA) is considered to be impossible. It means that recording the data of j-th node in the memory of i-th node \((i \ne j)\) can be carried out by means of operations of data transmission only.

The simplest way of parallelizing the adaptive scheme in a distributed system can consist in employment of the program model MapReduce [7]. A generalized algorithm of the parallel adaptive scheme could be presented as the Algorithm 1.

figure a

In the Algorithm 1, after initialization in the loop until the termination condition in the root subproblem is satisfied parallel iterations of the adaptive scheme are executed. At Stage 4 the set \(T'\) containing P subtasks with the best characteristics is formed. Stage 5 distributes the subproblems from \(T'\) to processors which in parallel execute one trial in their subtasks with the help of procedure ExecuteIteration. After completing all the trials a set \(T''\) of new subproblems obtained in the course of computations is formed. Stage 6 complements the set T with new subproblems and Stage 6 removes from the set T the terminated subproblems.

Practical implementation of the described algorithm can be realized in the framework of such the platforms as Hadoop [41] or Spark [42]. Unfortunately, this algorithm is synchronous and requires significant number of data transmissions. Moreover, implementation of Algorithm 1 implies that one processor (master node) plays the main role and coordinates the work of the other (slave) processors, i.e., the organization of the parallel processes is centralized.

To improve the parallel implementation of the adaptive scheme we propose for the adaptive scheme an asynchronous decentralized model of parallel computations where all processors are equal in rights.

Let, as earlier, a distributed system have P processors. We change Algorithm 1 so that procedure Initialize after creating an initial set T splits this set into P parts and send each part to separate processor. Moreover, i-th processor is supposed to be able to connect independently with any other node and to execute the information interchange with it after completing a trial in its subproblems. Under this assumption the full set T of subtasks can be stored portionwise on different nodes and in order to get the best subproblems, a node can request from other nodes only the best subproblems from their local subsets.

In this situation there exists no integrated iteration implemented by all the processors jointly and we can deal with iterations executed by processors separately. Completing its iteration a node can request immediately the best subproblems from other nodes and begin a new iteration. Two examples of requests are shown in Fig. 2.

Fig. 2.
figure 2

Gathering information for new iteration to the 1-st node (a) and to the 7-th node (b).

Under these assumptions we propose an asynchronous algorithm of the parallel adaptive scheme that is presented below in a pseudo code. This algorithm is supposed to be executed on every node.

figure b

Let us give some remarks about the Algorithm 2. \(T^\text {local}_i\) is the subset of subproblems stored on the i-th node. Procedure ReceiveTaskFromNode provides receiving the best subtask from other node. In line 8 the procedure PartOfInitialTaskSet forms the initial set \(T^\text {local}_i\) from subtasks obtained by the procedure Initialize.

The external loop for is an analogue of the loop while in Algorithm 1 but instead of termination condition used in while a limit \(n^\text {max}_i\) of trials executed on the i-th node is introduced. Termination condition of the optimization algorithm is transferred into lines 17–21, where l(t) denotes the level of the subtask t. The internal loop carries out collecting the subproblems with the best characteristics from all the nodes (except for the i-th node). Further, from the local set \(T^\text {local}_i\) the subproblem with the maximal characteristic is chosen and the new trial in this subproblem is executed.

If during solving a problem (1)–(2) i-th processor has performed \(n_i\) trials, it has transferred tasks to other processors \((P-1)n_i\) times because one trial requires \(P-1\) transmissions from the rest of nodes. Altogether the processors have performed \(n = \sum ^P_{i = 1} n_i\) trials and, consequently, executed \((P-1)n\) transmissions. The estimation of transmissions number for synchronous Algorithm 1 gives the result about \((P^2 + P)n\), which is worse essentially compared with the asynchronous case.

4 Numerical Experiments

To evaluate the effectiveness of the parallelism described in Sect. 3 a computational experiment aimed at comparison of sequential and asynchronous parallel implementations of the adaptive nested optimization schemes has been carried out. The simpler synchronous version did not participated in comparison because it is inferior to the asynchronous one according to theoretical estimations. The experiment consisted in solving a set of functions from the test class GKLS [10] of essentially multiextremal functions (hard subclass). These functions have complicated structure with tens of local minima. Nowadays this class is a classical tool for comparison of global optimization methods.

In experiment 50 functions of dimension 8 have been taken and solved both sequential and parallel adaptive schemes. As one-dimensional method for solving the subproblems 10 in both the schemes the information global search algorithm GSA [38, 39] was taken with reliability parameter \(r = 6.5\) and accuracy in termination condition \(\varepsilon = 0.01\). Computations were executed on the cluster consisting of 64 nodes, where each node is equipped with Intel® Xeon® Gold 6148 processor having 20 physical cores. Mellanox® Infiniband FDR was used as interconnection technology. The parallelism was provided on the base of MPI, version Intel® MPI 2019. Only one MPI rank was assigned to one node.

The global minima have been found with given accuracy in all the test problems. The results of the experiment are reflected in Table 1 and Fig. 3. The table contains average time spent by the parallel scheme per one test problem and speedup in time achieved by the parallel technique in comparison with the sequential one for different number of MPI ranks.

Table 1. Speedup in time on GKLS test class
Fig. 3.
figure 3

Speedup in time on GKLS test class

5 Conclusion

The paper proposes general descriptions of new parallel algorithms implementing methods of multiextremal optimization on the base of adaptive nested schemes reducing a multidimensional problem to a family of one-dimensional subproblems. Two parallel versions are presented in synchronous and asynchronous variants for computational distributed systems. Efficiency of the parallelism are investigated experimentally on the test class GKLS of complicated multiextremal multidimensional problems. The results of the experiment have shown essential speedup of the optimization process in case of applying the asynchronous adaptive scheme.

Combining the general parallel procedure of the adaptive scheme with fast univariate optimization methods (like characteristical ones) enables to construct new efficient techniques for solving multiextremal problems of high dimensions. Moreover, it is promising to develop new parallel implementations of the adaptive scheme oriented at other parallel architectures, for example, at supercomputers with mixed types of memory. It would be very interesting as well to compare the proposed algorithm with parallel optimization methods based on the other principles of parallelizing. These problems can be fruitful directions of further researches.