INTRODUCTION

A characteristic feature of the modern stage of computer development is the advent and widespread use of large-scale computers and supercomputers. Their appearance is due to the need to rapidly resolve complex information and calculation tasks in various spheres of action [118]. Solving such tasks is associated with the necessity to handle enormous volumes of information nearly impossible by von Neumann computers. A way out of this situation at present and in the foreseeable future is ensured by a parallelization of corresponding algorithms and using parallel computing systems (large-scale computers and supercomputers). However, the efficiency of using such systems for solving a given problem, largely depends on the compatibility of the structure of a computer program that implements an algorithm for solving the problem and organizes its execution with features of the computer architecture. For example, in programs targeted at a vector-pipeline computer, it is necessary to vectorize internal cycles, while in programs targeted at a parallel computer with a shared memory or a computer cluster with a distributed memory, we should parallelize pieces of a program and efficiently organize their execution. Here, the parallelization as well as the organization of programs running on a specific computer depend on the specifics of the task and represent problems of their developers [19, 20]. The aim of this paper is to propose tools for solving the problem of the optimal computational process organization (CPO) in parallel computers with a shared or distributed memory. The development of these tools consists in constructing a model and a method that ensure the problem-time optimal CPO.

DESCRIPTION OF THE MODEL

An application problem solved with the use of computer equipment is presented as a corresponding computer program. This program represents a given sequence of operations that convert input data into output data. An operation cannot start before the end of all operations that are suppliers of input data required for it. Hence, the computer program sets the partial order of execution of included operations [19]. For each pair of operations this order specifies an operation that must be executed earlier or notes that the operations are independent and can be executed in any order. If the program does not contain independent operations, then the order is linear and only one variant for the program’s implementation is available. In the case of independent operations, a fairly wide range of implementation variants can be found and the problem of choosing the best of them arises. For benefit of the choice, the criterion of optimality is specified and the computer program is decomposed into relatively independent blocks. Each block contains one or more related operations. The information interrelation of the blocks is presented by the directed graph

$$G = \left\{ {\left( {i,j} \right)} \right\},\quad i,j = 0,1, \ldots ,m,\quad i < j,$$
((1))

where i and j are node numbers of this graph, \(m\) is the number of allotted relatively independent blocks of the program, and \(\left( {m + 1} \right)\) is the number of the nodes of the graph.

Each computational procedure in this graph is placed in correspondence with an arc \(\left( {i,j} \right)\), which connects the \(i\)th and \(j\)th nodes. The node \(i = 0\) represents the event, which consists in starting the computer program. The nodes \(i = 1,{\kern 1pt} \,2, \ldots ,m\) represent the events, which consist in the completion of all computational procedures corresponding to arcs incoming to each of them. Here, the mth node represents the event, which consists in the completion of executing the computer program (the completion of problem solving).

The sequence of computational procedures obeys the following condition: a procedure corresponding to an arc coming from any node cannot be started until the completion of all procedures corresponding to the arcs incoming to this node is found.

The possibilities of a computing system for operations that compose each of the arcs are specified by the vector

$$T = \left\| {{{t}_{k}}(i,j)} \right\|,\quad (i,j) \in G(i,j),\quad k \in R,$$
((2))

where \({{t}_{k}}(i,j)\) is the time required for the kth element of the computing system to execute the procedure corresponding to the (i, j)th arc; k is the identifier of an element of the computing system; and R is the set of elements of the computing system.

If the kth element of the computing system cannot execute operations of the procedure corresponding to the (i, j)th arc, then \({{t}_{k}}(i,j) = \infty \).

Assume that with the procedure involving the execution of operations corresponding to the (i, j)th arc of a certain set W(i, j) of elements of the computing system, their execution time t(i, j) is determined by a certain function

$$t(i,j) = t[W(i,j)],\quad (i,j) \in G(i,j).$$
((3))

Its constructive presentation depends on the specific operations of the procedure.

The CPO is defined by the set

$$P = \left\{ {{{A}_{P}}(i,j),{{W}_{P}}(i,j)} \right\},(i,j) \in G(i,j),{{W}_{P}}(i,j)) \subset R,$$
((4))

where \({{A}_{P}}(i,j)\) is a moment corresponding to the start of the nth procedure of the program in implementing the P variant of the CPO and \({{W}_{P}}(i,j)\) is a set of elements of the computing system that are involved in executing the (i, j)th procedure of the program when implementing the P variant of the CPO.

It is assumed that an interruption of each started procedure is not allowed and the composition \({{W}_{P}}(i,j)\) of involved elements is not changed during the procedure’s execution.

The set of all paths of directed graph (1) that connect its initial and terminal nodes we denote by GL. The execution time T(P) of the computer program is equal to the maximum time TL(Р) of the travel LGL from the initial node i = 0 of graph (1) to its terminal node i = m in implementing the P variant оf the CPO.

With allowance for the accepted notation and constraints, a model for the choice of an optimum variant of the CPO can be formally presented as the following mathematical programming problem: specify the variant

$$P^{*} = P\left\{ {A_{P}^{*}(i,j),W_{P}^{*}(i,j)} \right\},\quad (i,j) \in G(i,j),\quad W_{P}^{*}(i,j) \subset R$$
((5))

(of the organization of a computational process implementing the execution of the computer program described by directed graph (1)) satisfying the condition

$${{T}_{L}}(P^{*}) = \mathop {\min }\limits_P \mathop {\max }\limits_{L \in {{G}_{L}}} {{T}_{L}}(P)$$
((6))

under the constraints

$${{A}_{P}}(i,j) \geqslant \mathop {\max }\limits_{(z,j) \in G} \left\{ {{{A}_{P}}(z,j) + {{t}_{P}}(z,j)} \right\},\quad (i,j) \in G(i,j),\quad (z,j) \in G(i,j);$$
((7))
$$\sum\limits_{(i,j) \in {{F}_{P}}(t)} {\sum\limits_{k \in {{W}_{P}}(i,j)} {{{d}_{k}}(i,j) \leqslant K} } ,$$
((8))

where FP(t) is a set of individual processes executed at the moment t for the P variant of the CPO;

$${{d}_{k}}(i,j) = \left\{ {\begin{array}{*{20}{c}} {1,\quad {\text{if}}\quad k \in {{W}_{P}}(i,j)} \\ {0,\quad {\text{if}}\quad k \notin {{W}_{P}}(i,j)} \end{array}} \right.;$$
((9))

and K is the power (the number of elements) of the set R.

Condition (6) in model (5)–(9) formally reflects the desire to minimize the execution time of the computer program. Condition (7) establishes that computational procedures corresponding to an arc coming from any node of graph (1) can start only after the completion of all procedures represented by the arcs incoming to this node. Condition (8) establishes that the number of simultaneously used elements of the computing system cannot exceed their total number.

In general, a computational process (organized on model (5)–(9)) that implements the execution of the computer program whose structure is described by directed graph (1), provides the minimization of the solution time of the corresponding applied problem.

METHOD FOR OPTIMIZING THE COMPUTATIONAL PROCESS

Model (5)–(9) represents a nonlinear integer programming problem. It belongs to the class of nonpolynomially complex tasks. A unified optimization algorithm for problems of this type is not found. Almost all of them require a special algorithm that takes into account its specificity [1925]. In this regard, consider an algorithm that takes into account the specificity of model (5)–(9). For the basis of the algorithm we take the branch-and-bound procedure [26, 27]. It contains a finite number of steps and rests on the following constructions [28, 29]:

(a) Representing the set \(V = \left\{ g \right\}\) of fragments g (feasible for constraints (7)–(9)) of a P plan of the CPO, as a tree of subsets (branching).

(b) Computing the low bound of goal function (5) for these fragments (branches).

(c) Finding acceptable variants of the P plan of the CPO.

(d) Checking these variants for optimality.

Branching in the proposed algorithm is implemented on the basis of a dichotomous scheme. In its implementation, every vgth node of the gth branch of the tree of variants represents an element of the plan. Here, if the (i, j)th computational procedure corresponding to this element starts at the Ag(i, j)th moment for the Wg(i, j)th variant of bringing elements of the computing system to the execution of the procedure, then

$${{{v}}_{g}} = \{ {{A}_{g}}(i,j),{{W}_{g}}(i,j)\} .$$
((10))

If the (i, j)th computational procedure does not do this, then

$${{{v}}_{g}} = \emptyset .$$
((11))

Here, \({{A}_{g}}\left( {i,j} \right),\left( {i,j} \right) \in g\) (starting points for corresponding procedures) for each branch \(g \in V\) must be chosen from the ascending sequence corresponding to this branch

$$\beta _{g}^{{}} = \left\{ {t_{g}^{n}} \right\},\quad n = 1,2, \ldots .$$
((12))

In this case, \(t_{g}^{1} = 0\) and the subsequent moments \(t_{g}^{n},n = 2,3, \ldots \) are determined recurrently by the formula

$$t_{g}^{n} = \mathop {\min }\limits_{\left( {i,j} \right) \in {{F}_{g}}(t_{g}^{{n - 1}})} \left\{ {{{A}_{g}}\left( {i,j} \right) + {{t}_{g}}\left( {i,j} \right)} \right\},\quad n = 2,3,...,$$
((13))

where \({{F}_{g}}\left( {t_{g}^{{n - 1}}} \right)\) is a set of procedures \((i,j)\) previously included in the gth branch and uncompleted to the point in time \(t_{g}^{{i - 1}}\); i.e.,

$${{F}_{g}}\left( {t_{g}^{{n - 1}}} \right) = \left\{ {\left( {i,j} \right)|\left( {i,j} \right) \in g,{{A}_{g}}\left( {i,j} \right) \leqslant t_{g}^{{n - 1}} \leqslant {{A}_{g}}\left( {i,j} \right) + {{t}_{g}}\left( {i,j} \right)} \right\}.$$
((14))

Here, \({{t}_{g}}\left( {i,j} \right)\) in (14) is determined by relation (3) for \(W(i,j) = {{W}_{g}}(i,j)],(i,j) \in G(i,j)\).

Hence, \({{\beta }_{g}}\) represents a sequence of points in time at which the execution of computational procedures included in the considered branch of the tree of variants ends and involved elements of the computing system are released. Here, the condition \(t_{g}^{1} = 0\) reflects the fact that all variants of implementing the computer program corresponding to graph (1) start at the moment \(t = 0\).

The timing of the beginning of the processes in breach of the specified sequences makes it impossible to reduce the total execution time of the program. In fact, for any plan \(P\) (of the CPO) containing the gth fragment, the early start of any procedure \(\left( {i,j} \right) \in g\) belongs to sequence (12) by definition. Consequently, later starts of computing procedures lying on paths critical for the P plan also belong to this sequence. For procedures that do not belong to critical paths, variation of the start times within corresponding reserves of time is possible. However, the limits of these reserves also belong to sequence (12) and the variation within the limits does not change the total execution time of a computer program. Consequently, the start and end times of procedures of (optimal for criterion (6)) plan (5) of the CPO must belong to the sequence \({{\beta }_{g}}\) corresponding to this plan.

To implement the dichotomic branching scheme, we introduce the set of all possible variants of the assignment of computer system’s elements for implementing each (i,j)th procedure (arc) of graph (1)

$$U = \left\{ {{{u}_{q}}\left( {i,j} \right)} \right\},\left( {i,j} \right) \in G,\,\,\,\,q = 1,2,...,$$
((15))

where \({{u}_{q}}\left( {i,j} \right) = \left\{ \begin{gathered} 1,\quad {\text{if for executing the (}}i{\text{,}}j{\text{)th procedure}}{\text{, the system involves a set}}\; \hfill \\ {{W}^{q}}\left( {i,j} \right)\;{\text{of elements of the computer network;}} \hfill \\ 0,\quad {\text{if for executing the (}}i{\text{, }}j{\text{)th procedure}}{\text{, the system does not involve a set}} \hfill \\ {{W}^{q}}\left( {i,j} \right)\;{\text{of elements of the computer network}}. \hfill \\ \end{gathered} \right.\)

Then, the position number \(q\) of the element \({{u}_{q}}\left( {i,j} \right) = 1\) of the set U characterizes an executed computational procedure and a variant of bringing elements of the computing system to the procedure’s execution.

With allowance for (15), a branching process for the benefit of forming optimal plan (5) of the CPO, consists in choosing (for each ordinary moment \(t_{g}^{i} \in {{\beta }_{g}}\)) admissible variables \({{u}_{q}}\left( {i,j} \right) \in U\) and specifying their values; i.e., relation (10) takes the form \({{\nu }_{g}} = \left\{ {t_{g}^{n},{{u}_{q}}\left( {i,j} \right) = 1} \right\}\) if the procedure \(\left( {i,n} \right)j \in G\) corresponding to the variable \({{u}_{q}}\left( {i,j} \right)\) starts at the moment \({{A}_{g}}\left( {i,j} \right) = t_{g}^{n}\) for the \({{W}_{g}}(i,j) = {{W}^{q}}\left( {i,j} \right)\)th variant of the assignment of elements of the computer network for the procedure’s execution, or \({{\nu }_{g}} = \left\{ {t_{g}^{n},{{j}_{q}}\left( {i,j} \right) = 0} \right\}\) if this process for the considered variant of the resource assignment does not start at the moment \({{A}_{g}}\left( {i,j} \right) = t_{g}^{n}\).

The set \(U_{g}^{n} \subset U\) of variables \({{u}_{q}}\left( {i,j} \right) \in U\), which can be included in the gth fragment of a plan of the CPO at the moment \(t_{g}^{n}\), contains values of \({{u}_{q}}\left( {i,j} \right) \in U\) corresponding to procedures \(\left( {i,j} \right) \in G\) previously not included in the gth branch; for them

$$A\left( {l,i} \right) + t\left( {l,i} \right) \leqslant t_{g}^{n},\quad \left( {l,i} \right) \in G;$$
((16))
$${{W}^{q}}\left( {i,j} \right) \subseteq R{}_{g}^{n}.$$
((17))

Here, \(R{}_{g}^{n}\) is a set of free elements of the computing system for the gth fragment of the plan of the CPO at the moment \(t{}_{g}^{n}\) and \({{U}_{g}}(t_{g}^{{n - 1}})\) is a set of variables \({{u}_{q}}\left( {i,j} \right)\) included in the gth branch of the tree of variants of a plan of the CPO to the moment \(t_{g}^{{n - 1}}\).

Condition (16) allocates computational procedures for which all procedures that represent suppliers of input data required for them are executed at the moment \(t_{g}^{n}\). Condition (17) selects those of them for which there are free elements of the computer network.

As an estimate \({{Q}_{g}}\) of the lower bound of goal function (6) for each gth fragment of the production plan, we can take the maximum time \(T_{L}^{g}\) of travel from the initial node i = 0 of graph (1) to its terminal node; this time is determined without regard to the constraints on allotting elements of the computer network for processes not included in g, i.e., belonging to the set G*, which supplements g to the set of procedures of graph (1). Here, if at the subsequent branching step (corresponding to the moment \(t_{g}^{n}\)) for the procedure \((i,j) \in G\) it is assigned \(u_{q}^{{}}(i,j) = 1,\) then for determining \({{Q}_{g}}\left( {u_{q}^{{}}(i,j) = 1} \right)\) we have the following items:

(a) The procedures \(\left( {l,i} \right) \in g\) previously included in the gth fragment of the plan (i.e., procedures for which \({{A}_{g}}\left( {l,i} \right) < t_{g}^{n}\)), start at corresponding moments \({{A}_{g}}\left( {l,i} \right)\) and end at the moments \({{A}_{g}}\left( {l,i} \right) + {{t}_{g}}\left( {l,i} \right)\) (here, \({{t}_{g}}\left( {l,i} \right)\) are determined by relation (3)).

(b) For the procedure \(\left( {i,j} \right) \in G^{*} \subset G\) corresponding to the variable \({{u}_{q}}(i,j) = 1\) and, hence, included at a considered step in the gth branch of the tree of variants, the start time is \({{A}_{g}}\left( {i,j} \right) = t_{g}^{n}\) and the completion time \(t_{g}^{n} + {{t}_{g}}(i,j)\) (\({{t}_{g}}(i,j)\)is determined by relation (3)).

(c) For the procedures \(\left( {e,h} \right) \in G^{*} \subset G\) (corresponding to the variables \({{u}_{r}} \in U,r \ne q\)), which by constraint (16) at the moment \(t_{g}^{n}\) cannot be executed at the same time with \(\left( {i,j} \right)\), the start time is \(t_{g}^{{n + 1}}\) and the completion time is determined by the relation \(t_{g}^{{n + 1}} + t{\text{*}}(e,h)\), where

$$t{\text{*}}(e,h) = \mathop {\min }\limits_{W(e,h)} t[W(e,h)],\quad (e,h) \in G^{*} \subset G,\quad W(e,h) \subset R.$$
((18))

Here, if at the considered branching step it is assigned \({{u}_{q}}(i,j) = 0\), then for determining \({{Q}_{g}}\left( {{{u}_{q}} = 0} \right)\) we additionally have the following items:

(a) Each procedure \(\left( {l,i} \right) \in G^{*} \subset G\) previously included in the gth fragment of the plan (i.e., a procedure for which \({{A}_{g}}\left( {l,i} \right) < t_{g}^{n}\)), starts at the corresponding moment \({{A}_{g}}\left( {l,i} \right)\) and ends at the moment \({{A}_{g}}\left( {l,i} \right) + t\left( {l,i} \right)\) (\(t\left( {l,i} \right)\) is determined by relation (3)).

(b) The procedure \(\left( {i,j} \right)\) corresponding to the variable \({{u}_{q}}(i,j) = 0\), starts at \(t_{g}^{{n + 1}}\) and ends at \(t_{g}^{{n + 1}} + t{\text{*}}\left( {l,i} \right)\).

(c) The remaining the procedures \(\left( {e,h} \right) \in G{\text{*}} \subset G\) corresponding to the variables \(u_{r}^{{}} \in U_{g}^{n},u \ne q\), start at \(t_{g}^{n}\) and end \(t_{g}^{n} + t{\text{*}}\left( {e,h} \right)\).

An important element of the method for solving problem (5)–(9) is a way of choosing the subsequent procedure and a variant of assigning elements of the computer network for executing the procedure: this element significantly affects the convergence of the method. Formally, this way consists in choosing the variables \(u_{q}^{{}} \in U_{g}^{n}\) for inclusion in the gth branch at \(t_{g}^{n}\). In the proposed method, at the subsequent branching step, the variable \(u_{q}^{{}} \in U_{g}^{n}\) is chosen; for this variable,

$$t{\text{*}}(i,j) = \mathop {\min }\limits_{W(i,j)} t[W(i,j)]\xrightarrow[{(i,j) \in G^{*}}]{}\max ,\,\,W(i,j) \subset R.$$
((19))

If several variables of this kind are found, then the choice of them is carried out in accordance with the following rule of preference:

$$\min i \to \min j;$$
((20))

i.e., the variable corresponding to the procedure with the smallest number i is first included in a branch of the tree of variants. If, here, several variables of this kind are found, then the variable corresponding to the procedure with the smallest number j is chosen from them.

The tree traversal is organized in accordance with the “go right” rule. This allows us to store in a computer memory when solving the problem only the current fragment of the plan, the smallest value of previously obtained values of the goal function, and a feasible plan (of the CPO) corresponding to this value.

This rule, combined with the considered method of choosing computational procedures and elements of the computing system involved in their implementation, represents an approximate algorithm for solving problem (5)–(9); this algorithm allows us to obtain the first feasible solution in a finite number of steps equal to the number N of arcs of the graph G.

Each gth branch ends if it includes all \(N\) procedures (arcs of the graph G), i.e., the feasible P plan of the CPO is obtained, or if

$${{Q}_{g}} \geqslant {{T}^{0}}(1 - \mu ),\quad 0 \leqslant \mu \leqslant 1,$$
((21))

where \({{T}^{0}}\) is the minimum value of the goal function for previously obtained feasible plans (a record) and μ is the assigned permissible relative deviation of the goal function from the optimal value (the accuracy of optimization).

The fulfilment of condition (21) means that it is impossible to improve a previously obtained plan on a considered branch more than by 100μ% and its continuation within the given accuracy of optimization is meaningless. The procedure for finding a solution ends if condition (21) holds for all remaining branches.

With the assigned way for traversing the tree of variants, such a situation is matched by the second return to the root node. Here, the last record represents the sought value of goal function (10) and feasible plan (5) corresponding to this value and represents the optimal Р* plan of the CPO for the program whose structure is presented by graph (1).

CONCLUSIONS

Using the this model and method ensures the formation of optimal plans of the CPO in computing systems with a shared or distributed memory. In the context of the model we propose the procedure of forming a plan of the CPO; the procedure provides the analysis of all possible variants for such an organization and excludes duplicates when viewing them. Here, for its implementation it is sufficient to store in the computer memory only the current fragment of the plan for organizing the process, the smallest value of previously obtained values of goal function (5), and a corresponding feasible plan. This significantly reduces the cost of computer resources in the formation of the plan in question. The proposed rule for traversing a tree of variants, combined with the method of choosing, at every branching step, the computational procedures and elements of the computing system involved in their implementation, represents an approximate algorithm for solving problem (5)–(9); this algorithm allows us to obtain the first feasible plan of the CPO for the finite number of steps equal to the number N of arcs of graph (1).

In general, the proposed model and method allow optimizing a computational process in parallel computing systems.